化学实体识别及向ChEBI的解析

Chemical Entity Recognition and Resolution to ChEBI.

作者信息

Grego Tiago, Pesquita Catia, Bastos Hugo P, Couto Francisco M

机构信息

Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal.

出版信息

ISRN Bioinform. 2012 Feb 15;2012:619427. doi: 10.5402/2012/619427. eCollection 2012.

DOI:10.5402/2012/619427

PMID:25937941

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4393067/

Abstract

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.

摘要

化学实体在生物医学文献中无处不在，因此需要开发能够有效识别这些实体的文本挖掘系统。由于缺乏可用的语料库和数据资源，该领域一直致力于基因和蛋白质命名实体识别系统的开发，但随着ChEBI的发布和带注释语料库的出现，这个任务可以得到解决。我们开发了一种基于机器学习的化学实体识别方法和一种基于词汇相似度的化学实体解析方法，并将它们与基于流行词典的Whatizit方法进行了比较。在所有任务中，我们的方法都优于基于词典的方法，实体识别任务的F值提高了20%，实体解析任务提高了2 - 5%，实体识别与解析组合任务提高了15%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec64/4393067/79445286e77b/ISRN.BIOINFORMATICS2012-619427.001.jpg

相似文献

Chemical Entity Recognition and Resolution to ChEBI.

ISRN Bioinform. 2012 Feb 15;2012:619427. doi: 10.5402/2012/619427. eCollection 2012.

Recognition of chemical entities: combining dictionary-based and grammar-based approaches.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S10. doi: 10.1186/1758-2946-7-S1-S10. eCollection 2015.

Biomedical named entity recognition using deep neural networks with contextual information.

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S8. doi: 10.1186/1758-2946-7-S1-S8. eCollection 2015.

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.

Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.

Exploiting and assessing multi-source data for supervised biomedical named entity recognition.

Bioinformatics. 2018 Jul 15;34(14):2474-2482. doi: 10.1093/bioinformatics/bty152.

Enhancement of chemical entity identification in text using semantic similarity validation.

PLoS One. 2013 May 2;8(5):e62984. doi: 10.1371/journal.pone.0062984. Print 2013.

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.

BMC Bioinformatics. 2018 Jun 28;19(1):248. doi: 10.1186/s12859-018-2211-5.

引用本文的文献

Genome-wide transcriptomics revealed carbon source-mediated gamma-aminobutyric acid (GABA) production in a probiotic, 9D3.

Heliyon. 2025 Jan 10;11(2):e41879. doi: 10.1016/j.heliyon.2025.e41879. eCollection 2025 Jan 30.

Chemical entity normalization for successful translational development of Alzheimer's disease and dementia therapeutics.

J Biomed Semantics. 2024 Jul 31;15(1):13. doi: 10.1186/s13326-024-00314-1.

Enhancing Genome-Scale Model by Integrative Exometabolome and Transcriptome: Unveiling Carbon Assimilation towards Sphingolipid Biosynthetic Capability of .

J Fungi (Basel). 2022 Aug 22;8(8):887. doi: 10.3390/jof8080887.

Probing Genome-Scale Model Reveals Metabolic Capability and Essential Nutrients for Growth of Probiotic KUB-AC5.

Biology (Basel). 2022 Feb 11;11(2):294. doi: 10.3390/biology11020294.

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.

Sci Data. 2021 Mar 25;8(1):91. doi: 10.1038/s41597-021-00875-1.

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.

Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S6. doi: 10.1186/1758-2946-7-S1-S6. eCollection 2015.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

Annotated chemical patent corpus: a gold standard for text mining.

PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.

Chemical named entities recognition: a review on approaches and applications.

J Cheminform. 2014 Apr 28;6:17. doi: 10.1186/1758-2946-6-17. eCollection 2014.

本文引用的文献

PLoS Comput Biol. 2010 Sep 23;6(9):e1000937. doi: 10.1371/journal.pcbi.1000937.

Cascaded classifiers for confidence-based chemical named entity recognition.

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-9-S11-S4.

Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.

Overview of BioCreative II gene normalization.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S3. doi: 10.1186/gb-2008-9-s2-s3. Epub 2008 Sep 1.

Overview of BioCreative II gene mention recognition.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.

Detection of IUPAC and IUPAC-like chemical names.

Bioinformatics. 2008 Jul 1;24(13):i268-76. doi: 10.1093/bioinformatics/btn181.

Text processing through Web services: calling Whatizit.

Bioinformatics. 2008 Jan 15;24(2):296-8. doi: 10.1093/bioinformatics/btm557. Epub 2007 Nov 15.

Frontiers of biomedical text mining: current progress.

Brief Bioinform. 2007 Sep;8(5):358-75. doi: 10.1093/bib/bbm045. Epub 2007 Oct 30.

ChEBI: a database and ontology for chemical entities of biological interest.

Nucleic Acids Res. 2008 Jan;36(Database issue):D344-50. doi: 10.1093/nar/gkm791. Epub 2007 Oct 11.

A scalable machine-learning approach to recognize chemical names within large text databases.

BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-7-S2-S3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

化学实体识别及向ChEBI的解析

Chemical Entity Recognition and Resolution to ChEBI.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献