基于 SemRep 的广谱生物医学关系抽取。

Broad-coverage biomedical relation extraction with SemRep.

机构信息

Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894, MD, USA.

University of Illinois at Urbana-Champaign, School of Information Sciences, 501 E Daniel Street, Champaign, 61820, IL, USA.

出版信息

BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.

DOI:10.1186/s12859-020-3517-7

PMID:32410573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7222583/

Abstract

BACKGROUND

In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.

RESULTS

A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F score. The recall and the F score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level.

CONCLUSIONS

SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.

摘要

背景

在信息过载的时代，自然语言处理 (NLP) 技术越来越需要支持先进的生物医学信息管理和发现应用。在本文中，我们深入描述了 SemRep，这是一个使用语言原则和 UMLS 领域知识从 PubMed 摘要中提取语义关系的 NLP 系统。我们还在两个数据集上评估了 SemRep。在一项评估中，我们使用手动注释的测试集进行全面的错误分析。在另一项评估中，我们评估了 SemRep 在 CDR 数据集上的性能，CDR 数据集是一个用因果化学-疾病关系注释的标准基准语料库。

结果

我们在手动注释数据集上对 SemRep 进行严格评估，得到 0.55 的精度、0.34 的召回率和 0.42 的 F 分数。更准确地描述 SemRep 性能的宽松评估得到 0.69 的精度、0.42 的召回率和 0.52 的 F 分数。错误分析表明命名实体识别/标准化是最大的错误源（26.9%），其次是参数识别（14%）和触发检测错误（12.5%）。在 CDR 语料库上的评估得到 0.90 的精度、0.24 的召回率和 0.38 的 F 分数。当对该语料库的评估仅限于句子边界关系时，召回率和 F 分数分别增加到 0.35 和 0.50，这是一个更公平的评估，因为 SemRep 在句子级别上运行。

结论

SemRep 是一个从生物医学文本中提取语义关系的广泛覆盖、可解释、强大的基线系统。它还支持 SemMedDB，这是一个基于语义关系的文献规模的知识图谱。通过 SemMedDB，SemRep 在科学界产生了重大影响，支持了各种临床和转化应用，包括临床决策、医学诊断、药物再利用、基于文献的发现和假设生成，并有助于改善健康结果。在正在进行的开发中，我们正在重新设计 SemRep，以提高其模块化和灵活性，并解决错误分析中发现的弱点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f33/7222583/ce20396f91fc/12859_2020_3517_Fig1_HTML.jpg

相似文献

Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。

BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.

Enhancing the coverage of SemRep using a relation classification approach.利用关系分类方法增强 SemRep 的覆盖范围。

J Biomed Inform. 2024 Jul;155:104658. doi: 10.1016/j.jbi.2024.104658. Epub 2024 May 21.

Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text.评估药物适应症资源在从临床文本中提取治疗关系方面的作用。

J Am Med Inform Assoc. 2015 Apr;22(e1):e162-76. doi: 10.1136/amiajnl-2014-002954. Epub 2014 Oct 21.

A methodology for extending domain coverage in SemRep.一种扩展 SemRep 领域覆盖范围的方法。

J Biomed Inform. 2013 Dec;46(6):1099-107. doi: 10.1016/j.jbi.2013.08.005. Epub 2013 Aug 21.

Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature.利用从生物医学文献中生成的 SuppKG 发现新的药物-补充剂相互作用。

J Biomed Inform. 2022 Jul;131:104120. doi: 10.1016/j.jbi.2022.104120. Epub 2022 Jun 13.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.利用生物医学知识图谱中的语义模式预测治疗和因果关系。

J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.

Sortal anaphora resolution to enhance relation extraction from biomedical literature.用于增强从生物医学文献中提取关系的类别指代消解。

BMC Bioinformatics. 2016 Apr 14;17:163. doi: 10.1186/s12859-016-1009-6.

Using SemRep to label semantic relations extracted from clinical text.使用SemRep标记从临床文本中提取的语义关系。

AMIA Annu Symp Proc. 2012;2012:587-95. Epub 2012 Nov 3.

Linked open data-based framework for automatic biomedical ontology generation.基于链接开放数据的自动生物医学本体生成框架。

BMC Bioinformatics. 2018 Sep 10;19(1):319. doi: 10.1186/s12859-018-2339-3.

引用本文的文献

Comparison of pipelines, seq2seq models, and LLMs for rare disease information extraction.用于罕见病信息提取的管道、序列到序列模型和语言模型的比较。

Nat Lang Process Inf Syst. 2026;15836:49-63. doi: 10.1007/978-3-031-97141-9_4. Epub 2025 Jul 1.

Evidence triangulator: using large language models to extract and synthesize causal evidence across study designs.证据三角测量器：利用大语言模型跨研究设计提取和综合因果证据。

Nat Commun. 2025 Aug 9;16(1):7355. doi: 10.1038/s41467-025-62783-x.

Alzheimer's disease knowledge graph enhances knowledge discovery and disease prediction.阿尔茨海默病知识图谱增强了知识发现和疾病预测能力。

Comput Biol Med. 2025 Apr 29;192(Pt A):110285. doi: 10.1016/j.compbiomed.2025.110285.

Inspired Spine Smart Universal Resource Identifier (SURI): An Adaptive AI Framework for Transforming Multilingual Speech Into Structured Medical Reports.灵感脊柱智能通用资源标识符（SURI）：一种用于将多语言语音转换为结构化医学报告的自适应人工智能框架。

Cureus. 2025 Mar 26;17(3):e81243. doi: 10.7759/cureus.81243. eCollection 2025 Mar.

Knowledge graph and its application in the study of neurological and mental disorders.知识图谱及其在神经和精神疾病研究中的应用。

Front Psychiatry. 2025 Mar 18;16:1452557. doi: 10.3389/fpsyt.2025.1452557. eCollection 2025.

DiMB-RE: mining the scientific literature for diet-microbiome associations.DiMB-RE：挖掘科学文献以寻找饮食与微生物组的关联。

J Am Med Inform Assoc. 2025 Jun 1;32(6):998-1006. doi: 10.1093/jamia/ocaf054.

A study on large-scale disease causality discovery from biomedical literature.一项关于从生物医学文献中发现大规模疾病因果关系的研究。

BMC Med Inform Decis Mak. 2025 Mar 18;25(1):136. doi: 10.1186/s12911-025-02893-0.

Data-Driven Hypothesis Generation in Clinical Research: What We Learned from a Human Subject Study?临床研究中数据驱动的假设生成：我们从一项人体研究中学到了什么？

Med Res Arch. 2024 Feb;12(2). doi: 10.18103/mra.v12i2.5132. Epub 2024 Feb 28.

Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用：一种流水线方法。

Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.

Triangulating evidence in health sciences with Annotated Semantic Queries.健康科学中使用带注释语义查询的三角证据。

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae519.

本文引用的文献

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.通过候选排序对假设生成系统进行大规模验证。

Proc IEEE Int Conf Big Data. 2018 Dec;2018:1494-1503. doi: 10.1109/bigdata.2018.8622637. Epub 2019 Jan 24.

Non-Negative Matrix Factorization for Drug Repositioning: Experiments with the repoDB Dataset.用于药物重新定位的非负矩阵分解：使用repoDB数据集的实验

AMIA Annu Symp Proc. 2020 Mar 4;2019:238-247. eCollection 2019.

Towards a characterization of apparent contradictions in the biomedical literature using context analysis.使用语境分析来刻画生物医学文献中的明显矛盾。

J Biomed Inform. 2019 Oct;98:103275. doi: 10.1016/j.jbi.2019.103275. Epub 2019 Aug 29.

Investigating the role of interleukin-1 beta and glutamate in inflammatory bowel disease and epilepsy using discovery browsing.利用探索性浏览研究白细胞介素-1β和谷氨酸在炎症性肠病和癫痫中的作用。

J Biomed Semantics. 2018 Dec 27;9(1):25. doi: 10.1186/s13326-018-0192-y.

Expanding vocabularies for complementary and alternative medicine therapies.扩展补充和替代医学疗法的词汇量。

Int J Med Inform. 2019 Jan;121:64-74. doi: 10.1016/j.ijmedinf.2018.11.009. Epub 2018 Nov 22.

Toward A Universal Biomedical Data Translator.迈向通用生物医学数据翻译器。

Clin Transl Sci. 2019 Mar;12(2):86-90. doi: 10.1111/cts.12591. Epub 2018 Nov 9.

Large-scale automated machine reading discovers new cancer-driving mechanisms.大规模自动化机器阅读发现新的致癌驱动机制。

Database (Oxford). 2018 Jan 1;2018:bay098. doi: 10.1093/database/bay098.

Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.利用生物医学知识图谱中的语义模式预测治疗和因果关系。

J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.

Generalizing biomedical relation classification with neural adversarial domain adaptation.基于神经对抗域适应的生物医学关系分类泛化。

Bioinformatics. 2018 Sep 1;34(17):2973-2981. doi: 10.1093/bioinformatics/bty190.

Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks.使用字词级和字符级循环神经网络提取药物相互作用

Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:5-12. doi: 10.1109/ICHI.2017.15. Epub 2017 Sep 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 SemRep 的广谱生物医学关系抽取。

Broad-coverage biomedical relation extraction with SemRep.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献