Pathogens and gene product normalization in the biomedical literature

Vishnyakova, Dina ; Pasche, Emilie ; Teodoro, Douglas ; Lovis, Christian ; Ruch, Patrick

In: Studies in health technology and informatics, 2012, vol. 174, p. 89-93

normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, taphyloccocus aureus…) and their gene products (protein ArsC, Arsenical pump modifier, Arsenate reductase…). The Our approach is based on the use of an Ontology... Plus

Ajouter à la liste personnelle
    Summary
    normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, taphyloccocus aureus…) and their gene products (protein ArsC, Arsenical pump modifier, Arsenate reductase…). The Our approach is based on the use of an Ontology Look-up Service, a Gene Ontology Categorizer (GOCat) and Gene Normalization methods. In the pathogen detection task the use of OLS disambiguates found pathogen names. GOCat results are incorporated into overall score system to support and to confirm the decisionmaking in normalization process of pathogens and their genomes. The evaluation was done on two test sets of BioCreativeIII benchmark: gold standard of manual curation (50 articles) and silver standard (507 articles) curated by collective results of BCIII participants. For the cross-species GN we achieved the precision of 46% for silver and 27% for gold sets. Pathogen normalization results showed 95% of precision and 93% of recall. The impact of GOCat explicitly improves results of pathogen and gene normalization, basically confirming identified pathogens and boosting correct gene identifiers on the top of the results’ list ranked by confidence. A correct identification of the pathogen is able to improve significantly normalization effectiveness and to solve the disambiguation problem of genes.