Suppr超能文献

基于 SemRep 的广谱生物医学关系抽取。

Broad-coverage biomedical relation extraction with SemRep.

机构信息

Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894, MD, USA.

University of Illinois at Urbana-Champaign, School of Information Sciences, 501 E Daniel Street, Champaign, 61820, IL, USA.

出版信息

BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.

Abstract

BACKGROUND

In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.

RESULTS

A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F score. The recall and the F score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level.

CONCLUSIONS

SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.

摘要

背景

在信息过载的时代,自然语言处理 (NLP) 技术越来越需要支持先进的生物医学信息管理和发现应用。在本文中,我们深入描述了 SemRep,这是一个使用语言原则和 UMLS 领域知识从 PubMed 摘要中提取语义关系的 NLP 系统。我们还在两个数据集上评估了 SemRep。在一项评估中,我们使用手动注释的测试集进行全面的错误分析。在另一项评估中,我们评估了 SemRep 在 CDR 数据集上的性能,CDR 数据集是一个用因果化学-疾病关系注释的标准基准语料库。

结果

我们在手动注释数据集上对 SemRep 进行严格评估,得到 0.55 的精度、0.34 的召回率和 0.42 的 F 分数。更准确地描述 SemRep 性能的宽松评估得到 0.69 的精度、0.42 的召回率和 0.52 的 F 分数。错误分析表明命名实体识别/标准化是最大的错误源(26.9%),其次是参数识别(14%)和触发检测错误(12.5%)。在 CDR 语料库上的评估得到 0.90 的精度、0.24 的召回率和 0.38 的 F 分数。当对该语料库的评估仅限于句子边界关系时,召回率和 F 分数分别增加到 0.35 和 0.50,这是一个更公平的评估,因为 SemRep 在句子级别上运行。

结论

SemRep 是一个从生物医学文本中提取语义关系的广泛覆盖、可解释、强大的基线系统。它还支持 SemMedDB,这是一个基于语义关系的文献规模的知识图谱。通过 SemMedDB,SemRep 在科学界产生了重大影响,支持了各种临床和转化应用,包括临床决策、医学诊断、药物再利用、基于文献的发现和假设生成,并有助于改善健康结果。在正在进行的开发中,我们正在重新设计 SemRep,以提高其模块化和灵活性,并解决错误分析中发现的弱点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f33/7222583/ce20396f91fc/12859_2020_3517_Fig1_HTML.jpg

相似文献

1
Broad-coverage biomedical relation extraction with SemRep.
BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.
2
Enhancing the coverage of SemRep using a relation classification approach.
J Biomed Inform. 2024 Jul;155:104658. doi: 10.1016/j.jbi.2024.104658. Epub 2024 May 21.
3
Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text.
J Am Med Inform Assoc. 2015 Apr;22(e1):e162-76. doi: 10.1136/amiajnl-2014-002954. Epub 2014 Oct 21.
4
A methodology for extending domain coverage in SemRep.
J Biomed Inform. 2013 Dec;46(6):1099-107. doi: 10.1016/j.jbi.2013.08.005. Epub 2013 Aug 21.
5
Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature.
J Biomed Inform. 2022 Jul;131:104120. doi: 10.1016/j.jbi.2022.104120. Epub 2022 Jun 13.
6
A comparison of word embeddings for the biomedical natural language processing.
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
7
Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.
J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.
8
Sortal anaphora resolution to enhance relation extraction from biomedical literature.
BMC Bioinformatics. 2016 Apr 14;17:163. doi: 10.1186/s12859-016-1009-6.
9
Using SemRep to label semantic relations extracted from clinical text.
AMIA Annu Symp Proc. 2012;2012:587-95. Epub 2012 Nov 3.
10
Linked open data-based framework for automatic biomedical ontology generation.
BMC Bioinformatics. 2018 Sep 10;19(1):319. doi: 10.1186/s12859-018-2339-3.

引用本文的文献

1
Comparison of pipelines, seq2seq models, and LLMs for rare disease information extraction.
Nat Lang Process Inf Syst. 2026;15836:49-63. doi: 10.1007/978-3-031-97141-9_4. Epub 2025 Jul 1.
3
Alzheimer's disease knowledge graph enhances knowledge discovery and disease prediction.
Comput Biol Med. 2025 Apr 29;192(Pt A):110285. doi: 10.1016/j.compbiomed.2025.110285.
5
Knowledge graph and its application in the study of neurological and mental disorders.
Front Psychiatry. 2025 Mar 18;16:1452557. doi: 10.3389/fpsyt.2025.1452557. eCollection 2025.
6
DiMB-RE: mining the scientific literature for diet-microbiome associations.
J Am Med Inform Assoc. 2025 Jun 1;32(6):998-1006. doi: 10.1093/jamia/ocaf054.
7
A study on large-scale disease causality discovery from biomedical literature.
BMC Med Inform Decis Mak. 2025 Mar 18;25(1):136. doi: 10.1186/s12911-025-02893-0.
8
Data-Driven Hypothesis Generation in Clinical Research: What We Learned from a Human Subject Study?
Med Res Arch. 2024 Feb;12(2). doi: 10.18103/mra.v12i2.5132. Epub 2024 Feb 28.
10
Triangulating evidence in health sciences with Annotated Semantic Queries.
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae519.

本文引用的文献

1
Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.
Proc IEEE Int Conf Big Data. 2018 Dec;2018:1494-1503. doi: 10.1109/bigdata.2018.8622637. Epub 2019 Jan 24.
2
Non-Negative Matrix Factorization for Drug Repositioning: Experiments with the repoDB Dataset.
AMIA Annu Symp Proc. 2020 Mar 4;2019:238-247. eCollection 2019.
3
Towards a characterization of apparent contradictions in the biomedical literature using context analysis.
J Biomed Inform. 2019 Oct;98:103275. doi: 10.1016/j.jbi.2019.103275. Epub 2019 Aug 29.
5
Expanding vocabularies for complementary and alternative medicine therapies.
Int J Med Inform. 2019 Jan;121:64-74. doi: 10.1016/j.ijmedinf.2018.11.009. Epub 2018 Nov 22.
6
Toward A Universal Biomedical Data Translator.
Clin Transl Sci. 2019 Mar;12(2):86-90. doi: 10.1111/cts.12591. Epub 2018 Nov 9.
7
Large-scale automated machine reading discovers new cancer-driving mechanisms.
Database (Oxford). 2018 Jan 1;2018:bay098. doi: 10.1093/database/bay098.
8
Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.
J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.
9
Generalizing biomedical relation classification with neural adversarial domain adaptation.
Bioinformatics. 2018 Sep 1;34(17):2973-2981. doi: 10.1093/bioinformatics/bty190.
10
Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks.
Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:5-12. doi: 10.1109/ICHI.2017.15. Epub 2017 Sep 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验