Tamames Javier
Alma Bioinformatics S,L, Ronda de Poniente 4, 28750 Tres Cantos, Madrid, Spain.
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-6-S1-S10. Epub 2005 May 24.
The identification of mentions of gene or gene products in biomedical texts is a critical step in the development of text mining applications in biosciences. The complexity and ambiguity of gene nomenclature makes this a very difficult task.
Here we present a novel approach based on a combination of carefully designed rules and several lexicons of biological concepts, implemented in the Text Detective system. Text Detective is able to normalize the results of gene mentions found by offering the appropriate database reference.
In BioCreAtIvE evaluation, Text Detective achieved results of 84% precision, 71% recall for task 1A, and 79% precision, 71% recall for mouse genes in task 1B.
在生物医学文本中识别基因或基因产物的提及是生物科学文本挖掘应用开发中的关键步骤。基因命名法的复杂性和模糊性使其成为一项非常困难的任务。
在此,我们提出一种基于精心设计的规则与多个生物概念词汇表相结合的新方法,该方法在文本侦探系统中得以实现。文本侦探能够通过提供适当的数据库参考来规范所发现的基因提及结果。
在BioCreAtIvE评估中,文本侦探在任务1A中实现了84%的精确率、71%的召回率,在任务1B中对于小鼠基因实现了79%的精确率、71%的召回率。