Suppr超能文献

评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

作者信息

Wei Chih-Hsuan, Peng Yifan, Leaman Robert, Davis Allan Peter, Mattingly Carolyn J, Li Jiao, Wiegers Thomas C, Lu Zhiyong

机构信息

National Center for Biotechnology Information, Bethesda, MD 20894, USA.

National Center for Biotechnology Information, Bethesda, MD 20894, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA.

出版信息

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

Abstract

Manually curating chemicals, diseases and their relationships is significantly important to biomedical research, but it is plagued by its high cost and the rapid growth of the biomedical literature. In recent years, there has been a growing interest in developing computational approaches for automatic chemical-disease relation (CDR) extraction. Despite these attempts, the lack of a comprehensive benchmarking dataset has limited the comparison of different techniques in order to assess and advance the current state-of-the-art. To this end, we organized a challenge task through BioCreative V to automatically extract CDRs from the literature. We designed two challenge tasks: disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. To assist system development and assessment, we created a large annotated text corpus that consisted of human annotations of chemicals, diseases and their interactions from 1500 PubMed articles. 34 teams worldwide participated in the CDR task: 16 (DNER) and 18 (CID). The best systems achieved an F-score of 86.46% for the DNER task--a result that approaches the human inter-annotator agreement (0.8875)--and an F-score of 57.03% for the CID task, the highest results ever reported for such tasks. When combining team results via machine learning, the ensemble system was able to further improve over the best team results by achieving 88.89% and 62.80% in F-score for the DNER and CID task, respectively. Additionally, another novel aspect of our evaluation is to test each participating system's ability to return real-time results: the average response time for each team's DNER and CID web service systems were 5.6 and 9.3 s, respectively. Most teams used hybrid systems for their submissions based on machining learning. Given the level of participation and results, we found our task to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction. Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.

摘要

人工整理化学物质、疾病及其关系对生物医学研究极为重要,但却面临成本高昂以及生物医学文献快速增长的困扰。近年来,人们对开发用于自动提取化学-疾病关系(CDR)的计算方法兴趣日增。尽管有这些尝试,但缺乏全面的基准数据集限制了不同技术之间的比较,从而无法评估和推进当前的技术水平。为此,我们通过生物创意V组织了一项挑战任务,以从文献中自动提取CDR。我们设计了两项挑战任务:疾病命名实体识别(DNER)和化学诱导疾病(CID)关系提取。为协助系统开发和评估,我们创建了一个大型注释文本语料库,该语料库包含来自1500篇PubMed文章的化学物质、疾病及其相互作用的人工注释。全球34个团队参与了CDR任务:16个团队参与DNER任务,18个团队参与CID任务。最佳系统在DNER任务中取得了86.46%的F值——这一结果接近人工注释者之间的一致性(0.8875)——在CID任务中取得了57.03%的F值,这是此类任务中报告的最高结果。当通过机器学习合并各团队结果时,集成系统在DNER和CID任务的F值方面分别达到88.89%和62.80%,能够在最佳团队结果的基础上进一步提高。此外,我们评估的另一个新颖之处在于测试每个参与系统返回实时结果的能力:各团队DNER和CID网络服务系统的平均响应时间分别为5.6秒和9.3秒。大多数团队在提交作品时使用了基于机器学习的混合系统。鉴于参与程度和结果,我们发现我们的任务成功地吸引了文本挖掘研究社区,产生了一个大型注释语料库,并提高了自动疾病识别和CDR提取的结果。数据库网址:http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/

相似文献

2
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
3
HITSZ_CDR: an end-to-end chemical and disease relation extraction system for BioCreative V.
Database (Oxford). 2016 Jun 5;2016. doi: 10.1093/database/baw077. Print 2016.
4
Extraction of chemical-induced diseases using prior knowledge and textual information.
Database (Oxford). 2016 Apr 14;2016. doi: 10.1093/database/baw046. Print 2016.
9
Overview of the BioCreative III Workshop.
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.
10
AuDis: an automatic CRF-enhanced disease normalization in biomedical text.
Database (Oxford). 2016 Jun 7;2016. doi: 10.1093/database/baw091. Print 2016.

引用本文的文献

1
Enhancing biomedical relation extraction with directionality.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i68-i76. doi: 10.1093/bioinformatics/btaf226.
6
Sequence labeling via reinforcement learning with aggregate labels.
Front Artif Intell. 2024 Nov 15;7:1463164. doi: 10.3389/frai.2024.1463164. eCollection 2024.
7
EnzChemRED, a rich enzyme chemistry relation extraction dataset.
Sci Data. 2024 Sep 9;11(1):982. doi: 10.1038/s41597-024-03835-7.
9
VAIV bio-discovery service using transformer model and retrieval augmented generation.
BMC Bioinformatics. 2024 Aug 21;25(1):273. doi: 10.1186/s12859-024-05903-6.
10

本文引用的文献

1
Challenges in clinical natural language processing for automated disorder normalization.
J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.
2
Community challenges in biomedical text mining over 10 years: success, failure and the future.
Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1.
3
A survey of current trends in computational drug repositioning.
Brief Bioinform. 2016 Jan;17(1):2-12. doi: 10.1093/bib/bbv020. Epub 2015 Mar 31.
4
tmChem: a high performance approach for chemical named entity recognition and normalization.
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.
5
The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.
Nucleic Acids Res. 2015 Jan;43(Database issue):D914-20. doi: 10.1093/nar/gku935. Epub 2014 Oct 17.
6
Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.
J Am Med Inform Assoc. 2015 Jan;22(1):143-54. doi: 10.1136/amiajnl-2013-002544. Epub 2014 Aug 21.
7
Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.
J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.
8
Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.
Database (Oxford). 2014 Jun 10;2014. doi: 10.1093/database/bau050. Print 2014.
9
Knowledge-based extraction of adverse drug events from biomedical text.
BMC Bioinformatics. 2014 Mar 4;15:64. doi: 10.1186/1471-2105-15-64.
10
NCBI disease corpus: a resource for disease name recognition and concept normalization.
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验