Suppr超能文献

化学实体识别及向ChEBI的解析

Chemical Entity Recognition and Resolution to ChEBI.

作者信息

Grego Tiago, Pesquita Catia, Bastos Hugo P, Couto Francisco M

机构信息

Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal.

出版信息

ISRN Bioinform. 2012 Feb 15;2012:619427. doi: 10.5402/2012/619427. eCollection 2012.

Abstract

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.

摘要

化学实体在生物医学文献中无处不在,因此需要开发能够有效识别这些实体的文本挖掘系统。由于缺乏可用的语料库和数据资源,该领域一直致力于基因和蛋白质命名实体识别系统的开发,但随着ChEBI的发布和带注释语料库的出现,这个任务可以得到解决。我们开发了一种基于机器学习的化学实体识别方法和一种基于词汇相似度的化学实体解析方法,并将它们与基于流行词典的Whatizit方法进行了比较。在所有任务中,我们的方法都优于基于词典的方法,实体识别任务的F值提高了20%,实体解析任务提高了2 - 5%,实体识别与解析组合任务提高了15%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec64/4393067/79445286e77b/ISRN.BIOINFORMATICS2012-619427.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验