Suppr超能文献

基于 SemRep 的广谱生物医学关系抽取。

Broad-coverage biomedical relation extraction with SemRep.

机构信息

Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894, MD, USA.

University of Illinois at Urbana-Champaign, School of Information Sciences, 501 E Daniel Street, Champaign, 61820, IL, USA.

出版信息

BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.

Abstract

BACKGROUND

In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.

RESULTS

A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F score. The recall and the F score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level.

CONCLUSIONS

SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.

摘要

背景

在信息过载的时代,自然语言处理 (NLP) 技术越来越需要支持先进的生物医学信息管理和发现应用。在本文中,我们深入描述了 SemRep,这是一个使用语言原则和 UMLS 领域知识从 PubMed 摘要中提取语义关系的 NLP 系统。我们还在两个数据集上评估了 SemRep。在一项评估中,我们使用手动注释的测试集进行全面的错误分析。在另一项评估中,我们评估了 SemRep 在 CDR 数据集上的性能,CDR 数据集是一个用因果化学-疾病关系注释的标准基准语料库。

结果

我们在手动注释数据集上对 SemRep 进行严格评估,得到 0.55 的精度、0.34 的召回率和 0.42 的 F 分数。更准确地描述 SemRep 性能的宽松评估得到 0.69 的精度、0.42 的召回率和 0.52 的 F 分数。错误分析表明命名实体识别/标准化是最大的错误源(26.9%),其次是参数识别(14%)和触发检测错误(12.5%)。在 CDR 语料库上的评估得到 0.90 的精度、0.24 的召回率和 0.38 的 F 分数。当对该语料库的评估仅限于句子边界关系时,召回率和 F 分数分别增加到 0.35 和 0.50,这是一个更公平的评估,因为 SemRep 在句子级别上运行。

结论

SemRep 是一个从生物医学文本中提取语义关系的广泛覆盖、可解释、强大的基线系统。它还支持 SemMedDB,这是一个基于语义关系的文献规模的知识图谱。通过 SemMedDB,SemRep 在科学界产生了重大影响,支持了各种临床和转化应用,包括临床决策、医学诊断、药物再利用、基于文献的发现和假设生成,并有助于改善健康结果。在正在进行的开发中,我们正在重新设计 SemRep,以提高其模块化和灵活性,并解决错误分析中发现的弱点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f33/7222583/ce20396f91fc/12859_2020_3517_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验