学习统一医学语言系统知识嵌入对生物医学文本中关系抽取的影响。

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts.

机构信息

Human Language Technology Research Institute, Department of Computer Science, Erik Jonsson School of Engineering & Computer Science, University of Texas at Dallas, Richardson, Texas, USA.

出版信息

J Am Med Inform Assoc. 2020 Oct 1;27(10):1556-1567. doi: 10.1093/jamia/ocaa205.

DOI:10.1093/jamia/ocaa205

PMID:33029619

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7647370/

Abstract

OBJECTIVE

We explored how knowledge embeddings (KEs) learned from the Unified Medical Language System (UMLS) Metathesaurus impact the quality of relation extraction on 2 diverse sets of biomedical texts.

MATERIALS AND METHODS

Two forms of KEs were learned for concepts and relation types from the UMLS Metathesaurus, namely lexicalized knowledge embeddings (LKEs) and unlexicalized KEs. A knowledge embedding encoder (KEE) enabled learning either LKEs or unlexicalized KEs as well as neural models capable of producing LKEs for mentions of biomedical concepts in texts and relation types that are not encoded in the UMLS Metathesaurus. This allowed us to design the relation extraction with knowledge embeddings (REKE) system, which incorporates either LKEs or unlexicalized KEs produced for relation types of interest and their arguments.

RESULTS

The incorporation of either LKEs or unlexicalized KE in REKE advances the state of the art in relation extraction on 2 relation extraction datasets: the 2010 i2b2/VA dataset and the 2013 Drug-Drug Interaction Extraction Challenge corpus. Moreover, the impact of LKEs is superior, achieving F1 scores of 78.2 and 82.0, respectively.

DISCUSSION

REKE not only highlights the importance of incorporating knowledge encoded in the UMLS Metathesaurus in a novel way, through 2 possible forms of KEs, but it also showcases the subtleties of incorporating KEs in relation extraction systems.

CONCLUSIONS

Incorporating LKEs informed by the UMLS Metathesaurus in a relation extraction system operating on biomedical texts shows significant promise. We present the REKE system, which establishes new state-of-the-art results for relation extraction on 2 datasets when using LKEs.

摘要

目的

我们探讨了从统一医学语言系统（UMLS）Metathesaurus 中学到的知识嵌入（KE）如何影响在 2 个不同的生物医学文本集上进行关系提取的质量。

材料与方法

从 UMLS Metathesaurus 中为概念和关系类型学习了两种形式的 KE，即词汇化知识嵌入（LKE）和非词汇化 KE。知识嵌入编码器（KEE）可以学习 LKE 或非词汇化 KE 以及能够为文本中生物医学概念的提及和 UMLS Metathesaurus 中未编码的关系类型生成 LKE 的神经模型。这使我们能够设计带有知识嵌入的关系提取（REKE）系统，该系统结合了为感兴趣的关系类型及其参数生成的 LKE 或非词汇化 KE。

结果

在 2 个关系提取数据集（2010 年 i2b2/VA 数据集和 2013 年药物相互作用提取挑战赛语料库）上，REKE 中包含 LKE 或非词汇化 KE 可提高关系提取的最新水平。此外，LKE 的影响更为优越，分别达到了 78.2 和 82.0 的 F1 分数。

讨论

REKE 不仅通过 2 种可能的 KE 形式突出了以新颖方式纳入 UMLS Metathesaurus 中编码知识的重要性，而且还展示了在关系提取系统中纳入 KE 的细微差别。

结论

在生物医学文本上运行的关系提取系统中纳入由 UMLS Metathesaurus 提供的 LKE 显示出巨大的潜力。我们提出了 REKE 系统，当使用 LKE 时，该系统在 2 个数据集上的关系提取中确立了新的最新水平。

相似文献

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts.学习统一医学语言系统知识嵌入对生物医学文本中关系抽取的影响。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1556-1567. doi: 10.1093/jamia/ocaa205.

Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text.评估药物适应症资源在从临床文本中提取治疗关系方面的作用。

J Am Med Inform Assoc. 2015 Apr;22(e1):e162-76. doi: 10.1136/amiajnl-2014-002954. Epub 2014 Oct 21.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.临床信息学中“现成可用”信息提取算法的应用：意大利医学记录的MetaMap注释可行性研究。

J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.

Adversarial Learning of Knowledge Embeddings for the Unified Medical Language System.用于统一医学语言系统的知识嵌入对抗学习

AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:543-552. eCollection 2019.

UMLS-based data augmentation for natural language processing of clinical research literature.基于 UMLS 的临床研究文献自然语言处理的数据增强。

J Am Med Inform Assoc. 2021 Mar 18;28(4):812-823. doi: 10.1093/jamia/ocaa309.

Improving medical term embeddings using UMLS Metathesaurus.利用 UMLS 语义学术语表改进医学术语嵌入。

BMC Med Inform Decis Mak. 2022 Apr 29;22(1):114. doi: 10.1186/s12911-022-01850-5.

Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus.美国国立医学图书馆医学主题词表语义网络和元词表各层次之间的一致性。

J Biomed Inform. 2003 Dec;36(6):450-61. doi: 10.1016/j.jbi.2003.11.001.

A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations.使用一千八百万条MEDLINE引文对五百万条统一医学语言系统（UMLS）元词表术语进行的综合分析。

AMIA Annu Symp Proc. 2010 Nov 13;2010:907-11.

引用本文的文献

Considering non-hospital data in clinical informatics use cases, a review of the National Emergency Medical Services Information System (NEMSIS).考虑临床信息学用例中的非医院数据，对国家紧急医疗服务信息系统（NEMSIS）的综述。

Inform Med Unlocked. 2022;35. doi: 10.1016/j.imu.2022.101129. Epub 2022 Nov 9.

Discovering microbe-disease associations from the literature using a hierarchical long short-term memory network and an ensemble parser model.利用层次长短时记忆网络和集成解析器模型从文献中发现微生物-疾病关联。

Sci Rep. 2021 Feb 24;11(1):4490. doi: 10.1038/s41598-021-83966-8.

The UMLS knowledge sources at 30: indispensable to current research and applications in biomedical informatics.30岁的统一医学语言系统知识源：生物医学信息学当前研究与应用不可或缺的要素

J Am Med Inform Assoc. 2020 Oct 1;27(10):1499-1501. doi: 10.1093/jamia/ocaa208.

本文引用的文献

Adversarial Learning of Knowledge Embeddings for the Unified Medical Language System.用于统一医学语言系统的知识嵌入对抗学习

AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:543-552. eCollection 2019.

Classifying medical relations in clinical text via convolutional neural networks.通过卷积神经网络对临床文本中的医疗关系进行分类。

Artif Intell Med. 2019 Jan;93:43-49. doi: 10.1016/j.artmed.2018.05.001. Epub 2018 May 18.

Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths.基于序列和最短依赖路径的分层 RNN 进行药物-药物相互作用提取。

Bioinformatics. 2018 Mar 1;34(5):828-835. doi: 10.1093/bioinformatics/btx659.

Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes.用于在临床笔记中分类关系的分段卷积神经网络（Seg-CNNs）。

J Am Med Inform Assoc. 2018 Jan 1;25(1):93-98. doi: 10.1093/jamia/ocx090.

Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.用图算法弥合语义与句法——提取生物医学关系的研究现状

Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5.

Semi-Supervised Learning to Identify UMLS Semantic Relations.用于识别统一医学语言系统语义关系的半监督学习

AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:67-75. eCollection 2014.

The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions.DDI 语料库：一个带有药理学物质和药物相互作用注释的语料库。

J Biomed Inform. 2013 Oct;46(5):914-20. doi: 10.1016/j.jbi.2013.07.011. Epub 2013 Jul 29.

Automatic extraction of relations between medical concepts in clinical texts.临床文本中医用概念间关系的自动提取。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):594-600. doi: 10.1136/amiajnl-2011-000153.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛：临床文本中的概念、断言和关系

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.基于机器学习的临床信息抽取三阶段解决方案：i2b2 2010 年的研究现状。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):557-62. doi: 10.1136/amiajnl-2011-000150. Epub 2011 May 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验