Wei Chih-Hsuan, Kao Hung-Yu, Lu Zhiyong
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, United States of America.
PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5.
As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications. SR4GN can be downloaded at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4GN.
正如最近的研究所表明的,物种识别与消歧是许多下游文本挖掘应用(如基因标准化任务和蛋白质-蛋白质相互作用提取)中最关键且具有挑战性的步骤之一。我们报告了SR4GN:一种用于生物医学文本中物种识别与消歧的开源工具。除了现有工具中的物种检测功能外,SR4GN针对基因标准化任务进行了优化。因此,它被开发用于将文档中检测到的物种与相应的基因提及进行关联。SR4GN在基准实验中准确率达到85.42%,与其他最先进的技术相比具有优势。最后,SR4GN被实现为一个独立的软件工具,从而使其在许多文本挖掘应用中使用起来既方便又稳健。SR4GN可从以下网址下载:http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4GN 。