生物标记器-GM：一种基因/蛋白质名称识别系统。

BioTagger-GM: a gene/protein name recognition system.

作者信息

Torii Manabu, Hu Zhangzhi, Wu Cathy H, Liu Hongfang

机构信息

The Imaging Science and Information Systems Center, Department of Oncology, Georgetown University Medical Center, 2115 Wisconsin Avenue NW, Washington, DC 20057, USA.

出版信息

J Am Med Inform Assoc. 2009 Mar-Apr;16(2):247-55. doi: 10.1197/jamia.M2844. Epub 2008 Dec 11.

DOI:10.1197/jamia.M2844

PMID:19074302

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2649315/

Abstract

OBJECTIVES

Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination.

DESIGN

BioTagger-GM consists of four main components: (1) dictionary lookup-gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning-machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing-heuristic rules are used to correct recognition errors, and (4) system combination-a voting scheme is used to combine recognition results from multiple systems.

MEASUREMENTS

The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure.

RESULTS

BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus.

CONCLUSION

The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system.

摘要

目标

生物医学命名实体识别（BNER）是从自由文本中挖掘生物医学知识的自动化系统的关键组成部分。在该领域的不同类型实体中，基因/蛋白质是BNER研究最多的一种。我们的目标是开发一个基因/蛋白质名称识别系统BioTagger-GM，该系统利用强大的机器学习框架和系统组合，从术语源中挖掘丰富信息。

设计

BioTagger-GM由四个主要组件组成：（1）字典查找——在文本中标记BioThesaurus中的基因/蛋白质名称和UMLS元词表中的生物医学术语；（2）机器学习——使用字典查找结果作为一种特征来训练机器学习系统；（3）后处理——使用启发式规则纠正识别错误；（4）系统组合——使用投票方案组合多个系统的识别结果。

评估方法

使用BioCreAtIvE II基因提及（GM）语料库来评估所提出的方法。为了测试其一般适用性，还在为基因/蛋白质名称识别而修改的JNLPBA语料库上对该方法进行了评估。通过交叉验证测试评估系统的性能，并使用精确率、召回率和F值进行衡量。

结果

BioTagger-GM在BioCreAtIvE II GM语料库上的F值达到了0.8887，高于BioCreAtIvE II挑战赛中排名第一的系统。该方法在修改后的JNLPBA语料库上的适用性也得到了证实。

结论

结果表明，可以将术语源、强大的机器学习框架和系统组合集成起来，构建一个有效的BNER系统。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

生物标记器-GM：一种基因/蛋白质名称识别系统。

BioTagger-GM: a gene/protein name recognition system.

作者信息

机构信息

出版信息

OBJECTIVES

DESIGN

MEASUREMENTS

RESULTS

CONCLUSION

目标

设计

评估方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

生物标记器-GM：一种基因/蛋白质名称识别系统。

BioTagger-GM: a gene/protein name recognition system.

作者信息

机构信息

出版信息

OBJECTIVES

DESIGN

MEASUREMENTS

RESULTS

CONCLUSION

目标

设计

评估方法

结果

结论

相似文献

引用本文的文献

本文引用的文献