Share this post on:

N everyday language. It is often difficult to disambiguate similar entity classes, as they can have similar contexts and morphologies. For example, a simple heuristic for determining whether a term refers to a gene or protein is that proteins begin with an upper case letter (PspA) and genes begin with a lower case ( pspA). This pattern is, BQ-123 site however, not maintained consistently in scientific writing, and humans show substantial disagreement on this task,17 with an average pair-wise agreement among three annotators of 77.58 per cent. The Drosphilia melanogaster literature is probably the best example of the problems that exist regarding nomenclatures. Some Drosphilia genes are named after their associated phenotype, such as eyeless or fruity, which leads to difficulties in disambiguating whether it is the phenotype or the gene that is being described. Gene names such as Not and That also exist, which are homonymous (see Table 1). PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/26024392 Some gene names are multi-word namesTable 1. Table of linguistic terms. Definitions obtained from the Oxford Dictionary and WordNet Term Anaphor Meaning A word or phrase that refers back to an earlier word or phrase The coexistence of many possible meanings for a word or phrase Each of two or more words having the same spelling and pronunciation but different meanings and origins Relating the meaning in language or logic The arrangement of words and phrases to create well-formed sentences in a language One of the traditional categories of words intended to reflect their functions in a grammatical contextPolysemyHomonymySemantics SyntaxPart of speech# HENRY STEWART PUBLICATIONS 1479?364. HUMAN GENOMICS. VOL 5. NO 1. 17 ?9 OCTOBERText mining for genomics and systems biologyREVIEWTable 2. Some freely available software for NLP tasks in the biological domain. Task refers to the part of a text-mining pipeline that the software can be used for. Abbreviations: NER, named entity recognition; POS, part of speech tagger; PPI, protein?protein interaction extraction; SEN, sentencisation Name AbGeneURL ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/AbGene http://pages.cs.wisc.edu/ bsettles/abner/ http://www-tsujii.is.s.u-tokyo.ac.jp/ satre/akane/ http://banner.sourceforge.net/ http://www.seas.upenn.edu/ strctlrn/BioTagger/BioTagger.html http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ http://mars.cs.utu.fi/PPICorpora/GraphKernel.html http://linnaeus.sourceforge.net/ http://julielab.de/ http://julielab.de/ http://alias-i.com/lingpipe/ http://ii-public.nlm.nih.gov/MMTx/MedPost_SKR.shtml http://cubic.bioc.columbia.edu/services/nlprot/ http://opendmap.sourceforge.net/ http://sourceforge.net/projects/oscar3-chem/ http://isoft.postech.ac.kr/Research/BioNER/POSBIOTM/NER/main.html http://text0.mib.man.ac.uk:8080/scottpiao/sent_detector http://www.ebi.ac.uk/webservices/whatizit/Task NER NER PPI NER NER NER PPI NER NER SEN NER POS NER PPI NER NER SEN NER/PPIABNER29 AkanePPI30 BANNER31 BioTagger-GM GENIA32 Graph Kernel33 LINNAEUS34 JNET JSBD LingPipe MedPOS36 NLProt37OpenDMAP38 OSCAR339 POSBIOTM-NER40 Sptoolkit Whatizitsuch as Mind the gap and IL-2 receptor. In the last case, problems detecting the correct boundary may lead to the entity being tagged as IL-2, which completely alters the meaning of the entity.18 A variety of methods have been proposed for biological NER (see Table 2), with only a small portion freely available for download or publicly accessible via web servers/services. These tools fall into four main categories: dictionary-bas.

Share this post on: