• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物标记器-GM:一种基因/蛋白质名称识别系统。

BioTagger-GM: a gene/protein name recognition system.

作者信息

Torii Manabu, Hu Zhangzhi, Wu Cathy H, Liu Hongfang

机构信息

The Imaging Science and Information Systems Center, Department of Oncology, Georgetown University Medical Center, 2115 Wisconsin Avenue NW, Washington, DC 20057, USA.

出版信息

J Am Med Inform Assoc. 2009 Mar-Apr;16(2):247-55. doi: 10.1197/jamia.M2844. Epub 2008 Dec 11.

DOI:10.1197/jamia.M2844
PMID:19074302
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2649315/
Abstract

OBJECTIVES

Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination.

DESIGN

BioTagger-GM consists of four main components: (1) dictionary lookup-gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning-machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing-heuristic rules are used to correct recognition errors, and (4) system combination-a voting scheme is used to combine recognition results from multiple systems.

MEASUREMENTS

The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure.

RESULTS

BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus.

CONCLUSION

The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system.

摘要

目标

生物医学命名实体识别(BNER)是从自由文本中挖掘生物医学知识的自动化系统的关键组成部分。在该领域的不同类型实体中,基因/蛋白质是BNER研究最多的一种。我们的目标是开发一个基因/蛋白质名称识别系统BioTagger-GM,该系统利用强大的机器学习框架和系统组合,从术语源中挖掘丰富信息。

设计

BioTagger-GM由四个主要组件组成:(1)字典查找——在文本中标记BioThesaurus中的基因/蛋白质名称和UMLS元词表中的生物医学术语;(2)机器学习——使用字典查找结果作为一种特征来训练机器学习系统;(3)后处理——使用启发式规则纠正识别错误;(4)系统组合——使用投票方案组合多个系统的识别结果。

评估方法

使用BioCreAtIvE II基因提及(GM)语料库来评估所提出的方法。为了测试其一般适用性,还在为基因/蛋白质名称识别而修改的JNLPBA语料库上对该方法进行了评估。通过交叉验证测试评估系统的性能,并使用精确率、召回率和F值进行衡量。

结果

BioTagger-GM在BioCreAtIvE II GM语料库上的F值达到了0.8887,高于BioCreAtIvE II挑战赛中排名第一的系统。该方法在修改后的JNLPBA语料库上的适用性也得到了证实。

结论

结果表明,可以将术语源、强大的机器学习框架和系统组合集成起来,构建一个有效的BNER系统。

相似文献

1
BioTagger-GM: a gene/protein name recognition system.生物标记器-GM:一种基因/蛋白质名称识别系统。
J Am Med Inform Assoc. 2009 Mar-Apr;16(2):247-55. doi: 10.1197/jamia.M2844. Epub 2008 Dec 11.
2
Evaluating word representation features in biomedical named entity recognition tasks.评估生物医学命名实体识别任务中的词表示特征。
Biomed Res Int. 2014;2014:240403. doi: 10.1155/2014/240403. Epub 2014 Mar 6.
3
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
4
Character-level neural network for biomedical named entity recognition.用于生物医学命名实体识别的字符级神经网络。
J Biomed Inform. 2017 Jun;70:85-91. doi: 10.1016/j.jbi.2017.05.002. Epub 2017 May 11.
5
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
6
Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络
BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.
7
Various criteria in the evaluation of biomedical named entity recognition.生物医学命名实体识别评估中的各种标准。
BMC Bioinformatics. 2006 Feb 24;7:92. doi: 10.1186/1471-2105-7-92.
8
Using machine learning for concept extraction on clinical documents from multiple data sources.利用机器学习从多个数据源的临床文档中提取概念。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):580-7. doi: 10.1136/amiajnl-2011-000155. Epub 2011 Jun 27.
9
A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain.医疗领域中用于生物医学命名实体识别的电子健康记录文本挖掘综述
Healthcare (Basel). 2023 Apr 28;11(9):1268. doi: 10.3390/healthcare11091268.
10
Integrating various resources for gene name normalization.整合各种资源进行基因名称标准化。
PLoS One. 2012;7(9):e43558. doi: 10.1371/journal.pone.0043558. Epub 2012 Sep 12.

引用本文的文献

1
Identifying Symptoms of Delirium from Clinical Narratives Using Natural Language Processing.使用自然语言处理从临床叙述中识别谵妄症状
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:305-311. doi: 10.1109/ichi61247.2024.00046. Epub 2024 Aug 22.
2
From Machine Learning to Patient Outcomes: A Comprehensive Review of AI in Pancreatic Cancer.从机器学习到患者预后:胰腺癌人工智能的全面综述
Diagnostics (Basel). 2024 Jan 12;14(2):174. doi: 10.3390/diagnostics14020174.
3
Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation.推动自然语言处理(NLP)以加速医疗人工智能发展的需求以及梅奥诊所的NLP即服务实施。
NPJ Digit Med. 2019 Dec 17;2:130. doi: 10.1038/s41746-019-0208-8. eCollection 2019.
4
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
5
Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach.使用支持全文的领域无关术语提取方法来理解学科词汇。
PLoS One. 2017 Nov 29;12(11):e0187762. doi: 10.1371/journal.pone.0187762. eCollection 2017.
6
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.从美国国立医学图书馆临床试验数据库中提取用于个性化癌症治疗的基因改变信息。
J Am Med Inform Assoc. 2016 Jul;23(4):750-7. doi: 10.1093/jamia/ocw009. Epub 2016 Mar 24.
7
Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.使用支持向量机识别医院出院小结中的药物相关实体。
Proc Int Conf Comput Ling. 2010 Aug;2010:259-266.
8
GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.GNormPlus:一种用于标记基因、基因家族和蛋白质结构域的综合方法。
Biomed Res Int. 2015;2015:918710. doi: 10.1155/2015/918710. Epub 2015 Aug 25.
9
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。
BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.
10
Mining consumer health vocabulary from community-generated text.从社区生成的文本中挖掘消费者健康词汇。
AMIA Annu Symp Proc. 2014 Nov 14;2014:1150-9. eCollection 2014.

本文引用的文献

1
Overview of BioCreative II gene mention recognition.生物创意II基因提及识别概述。
Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.
2
BANNER: an executable survey of advances in biomedical named entity recognition.横幅:生物医学命名实体识别进展的可执行调查。
Pac Symp Biocomput. 2008:652-63.
3
Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.使用逻辑回归学习用于基因/蛋白质名称字典查找的字符串相似性度量。
Bioinformatics. 2007 Oct 15;23(20):2768-74. doi: 10.1093/bioinformatics/btm393. Epub 2007 Aug 12.
4
BioThesaurus: a web-based thesaurus of protein and gene names.生物词库:一个基于网络的蛋白质和基因名称词库。
Bioinformatics. 2006 Jan 1;22(1):103-5. doi: 10.1093/bioinformatics/bti749. Epub 2005 Nov 2.
5
High-recall protein entity recognition using a dictionary.使用词典进行高召回率蛋白质实体识别。
Bioinformatics. 2005 Jun;21 Suppl 1(Suppl 1):i266-73. doi: 10.1093/bioinformatics/bti1006.
6
Identifying gene and protein mentions in text using conditional random fields.使用条件随机场识别文本中的基因和蛋白质提及。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-6-S1-S6. Epub 2005 May 24.
7
GENETAG: a tagged corpus for gene/protein named entity recognition.GENETAG:一个用于基因/蛋白质命名实体识别的带标注语料库。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-6-S1-S3. Epub 2005 May 24.
8
BioCreAtIvE task 1A: gene mention finding evaluation.生物创意任务1A:基因提及发现评估。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2105-6-S1-S2. Epub 2005 May 24.
9
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.ABNER:一种用于在文本中自动标记基因、蛋白质及其他实体名称的开源工具。
Bioinformatics. 2005 Jul 15;21(14):3191-2. doi: 10.1093/bioinformatics/bti475. Epub 2005 Apr 28.
10
The Universal Protein Resource (UniProt).通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D154-9. doi: 10.1093/nar/gki070.