Human Language Technology Research Institute, Department of Computer Science, Erik Jonsson School of Engineering & Computer Science, University of Texas at Dallas, Richardson, Texas, USA.
J Am Med Inform Assoc. 2020 Oct 1;27(10):1556-1567. doi: 10.1093/jamia/ocaa205.
We explored how knowledge embeddings (KEs) learned from the Unified Medical Language System (UMLS) Metathesaurus impact the quality of relation extraction on 2 diverse sets of biomedical texts.
Two forms of KEs were learned for concepts and relation types from the UMLS Metathesaurus, namely lexicalized knowledge embeddings (LKEs) and unlexicalized KEs. A knowledge embedding encoder (KEE) enabled learning either LKEs or unlexicalized KEs as well as neural models capable of producing LKEs for mentions of biomedical concepts in texts and relation types that are not encoded in the UMLS Metathesaurus. This allowed us to design the relation extraction with knowledge embeddings (REKE) system, which incorporates either LKEs or unlexicalized KEs produced for relation types of interest and their arguments.
The incorporation of either LKEs or unlexicalized KE in REKE advances the state of the art in relation extraction on 2 relation extraction datasets: the 2010 i2b2/VA dataset and the 2013 Drug-Drug Interaction Extraction Challenge corpus. Moreover, the impact of LKEs is superior, achieving F1 scores of 78.2 and 82.0, respectively.
REKE not only highlights the importance of incorporating knowledge encoded in the UMLS Metathesaurus in a novel way, through 2 possible forms of KEs, but it also showcases the subtleties of incorporating KEs in relation extraction systems.
Incorporating LKEs informed by the UMLS Metathesaurus in a relation extraction system operating on biomedical texts shows significant promise. We present the REKE system, which establishes new state-of-the-art results for relation extraction on 2 datasets when using LKEs.
我们探讨了从统一医学语言系统(UMLS)Metathesaurus 中学到的知识嵌入(KE)如何影响在 2 个不同的生物医学文本集上进行关系提取的质量。
从 UMLS Metathesaurus 中为概念和关系类型学习了两种形式的 KE,即词汇化知识嵌入(LKE)和非词汇化 KE。知识嵌入编码器(KEE)可以学习 LKE 或非词汇化 KE 以及能够为文本中生物医学概念的提及和 UMLS Metathesaurus 中未编码的关系类型生成 LKE 的神经模型。这使我们能够设计带有知识嵌入的关系提取(REKE)系统,该系统结合了为感兴趣的关系类型及其参数生成的 LKE 或非词汇化 KE。
在 2 个关系提取数据集(2010 年 i2b2/VA 数据集和 2013 年药物相互作用提取挑战赛语料库)上,REKE 中包含 LKE 或非词汇化 KE 可提高关系提取的最新水平。此外,LKE 的影响更为优越,分别达到了 78.2 和 82.0 的 F1 分数。
REKE 不仅通过 2 种可能的 KE 形式突出了以新颖方式纳入 UMLS Metathesaurus 中编码知识的重要性,而且还展示了在关系提取系统中纳入 KE 的细微差别。
在生物医学文本上运行的关系提取系统中纳入由 UMLS Metathesaurus 提供的 LKE 显示出巨大的潜力。我们提出了 REKE 系统,当使用 LKE 时,该系统在 2 个数据集上的关系提取中确立了新的最新水平。