探索边界：生物医学文本中的基因与蛋白质识别

Exploring the boundaries: gene and protein identification in biomedical text.

作者信息

Finkel Jenny, Dingare Shipra, Manning Christopher D, Nissim Malvina, Alex Beatrice, Grover Claire

机构信息

Department of Computer Science, Stanford University, Stanford, CA 94305-9040, USA.

出版信息

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-6-S1-S5. Epub 2005 May 24.

DOI:10.1186/1471-2105-6-S1-S5

PMID:15960839

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1869019/

Abstract

BACKGROUND

Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools.

METHODS

We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts.

RESULTS

This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation.

CONCLUSION

Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

摘要

背景

优秀的自动信息提取工具为处理数量激增的生物医学文献的自动化过程带来了希望，而成功的命名实体识别是此类工具的关键组成部分。

方法

我们提出了一个基于最大熵的系统，该系统结合了多种不同的特征，用于识别生物医学摘要中的基因和蛋白质名称。

结果

该系统参加了生物创造性比较评估，在“开放”评估中精确率达到0.83，召回率达到0.84；在“封闭”评估中精确率为0.78，召回率为0.85。

结论

主要贡献在于在多个粒度级别丰富使用从训练数据派生的特征，专注于正确识别实体边界，以及创新性地使用包括完整MEDLINE摘要和网络搜索在内的多种外部知识源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3295/1869019/1855295e5db9/1471-2105-6-S1-S5-1.jpg

相似文献

Exploring the boundaries: gene and protein identification in biomedical text.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-6-S1-S5. Epub 2005 May 24.

Recognition of protein/gene names from text using an ensemble of classifiers.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-6-S1-S7. Epub 2005 May 24.

An automated procedure to identify biomedical articles that contain cancer-associated gene variants.

Hum Mutat. 2006 Sep;27(9):957-64. doi: 10.1002/humu.20363.

Building a protein name dictionary from full text: a machine learning term extraction approach.

BMC Bioinformatics. 2005 Apr 7;6:88. doi: 10.1186/1471-2105-6-88.

A sentence sliding window approach to extract protein annotations from biomedical articles.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-6-S1-S19. Epub 2005 May 24.

Comparative experiments on learning information extractors for proteins and their interactions.

Artif Intell Med. 2005 Feb;33(2):139-55. doi: 10.1016/j.artmed.2004.07.016.

Evaluation of BioCreAtIvE assessment of task 2.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.

Automated recognition of malignancy mentions in biomedical literature.

BMC Bioinformatics. 2006 Nov 7;7:492. doi: 10.1186/1471-2105-7-492.

Enhancing HMM-based biomedical named entity recognition by studying special phenomena.

J Biomed Inform. 2004 Dec;37(6):411-22. doi: 10.1016/j.jbi.2004.08.005.

Text-mining approaches in molecular biology and biomedicine.

Drug Discov Today. 2005 Mar 15;10(6):439-45. doi: 10.1016/S1359-6446(05)03376-3.

引用本文的文献

Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.

JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417.

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.

J Chem Inf Model. 2019 Sep 23;59(9):3635-3644. doi: 10.1021/acs.jcim.9b00164. Epub 2019 Sep 10.

CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools.

J Cheminform. 2019 Mar 14;11(1):21. doi: 10.1186/s13321-019-0343-x.

Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.

JMIR Med Inform. 2018 Nov 26;6(4):e12159. doi: 10.2196/12159.

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion.

Database (Oxford). 2016 Aug 7;2016. doi: 10.1093/database/baw112. Print 2016.

A Study of Concept Extraction Across Different Types of Clinical Notes.

AMIA Annu Symp Proc. 2015 Nov 5;2015:737-46. eCollection 2015.

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.

The future of whole-cell modeling.

Curr Opin Biotechnol. 2014 Aug;28:111-5. doi: 10.1016/j.copbio.2014.01.012. Epub 2014 Feb 17.

A kernel-based approach for biomedical named entity recognition.

ScientificWorldJournal. 2013 Dec 29;2013:950796. doi: 10.1155/2013/950796. eCollection 2013.

Biomedical named entity extraction: some issues of corpus compatibilities.

Springerplus. 2013 Nov 12;2:601. doi: 10.1186/2193-1801-2-601. eCollection 2013.

本文引用的文献

Rutabaga by any other name: extracting biological names.

J Biomed Inform. 2002 Aug;35(4):247-59. doi: 10.1016/s1532-0464(03)00014-5.

A simple algorithm for identifying abbreviation definitions in biomedical text.

Pac Symp Biocomput. 2003:451-62.

Tagging gene and protein names in biomedical text.

Bioinformatics. 2002 Aug;18(8):1124-32. doi: 10.1093/bioinformatics/18.8.1124.

The NLM Indexing Initiative.

Proc AMIA Symp. 2000:17-21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

探索边界：生物医学文本中的基因与蛋白质识别

Exploring the boundaries: gene and protein identification in biomedical text.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献