Suppr超能文献

利用 UMLS 语义学术语表改进医学术语嵌入。

Improving medical term embeddings using UMLS Metathesaurus.

机构信息

Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.

出版信息

BMC Med Inform Decis Mak. 2022 Apr 29;22(1):114. doi: 10.1186/s12911-022-01850-5.

Abstract

BACKGROUND

Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small.

METHODS

In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus.

RESULTS

To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications.

CONCLUSION

This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.

摘要

背景

医疗服务提供者创建电子健康记录 (EHR) 来描述用于治疗患者的情况和程序。医务人员以自由文本形式输入的医学笔记是 EHR 中特别有见地的组成部分。在许多医学信息学应用中,应用机器学习工具处理医学笔记具有很大的兴趣。学习笔记中术语的向量表示或嵌入是此类应用中的重要预处理步骤。然而,学习良好的嵌入是具有挑战性的,因为医学笔记中富含专业术语,并且在实际应用中可用的 EHR 数量通常非常少。

方法

在本文中,我们提出了一种从有限数量的医学笔记中学习医学术语嵌入的新算法。该算法称为 definition2vec,它利用术语定义的外部信息。它是 skip-gram 算法的扩展,该算法将统一医学语言系统 (UMLS) Metathesaurus 提供的医学术语的文本定义纳入其中。

结果

为了评估所提出的方法,我们使用了公开的医疗信息监护 (MIMIC-III) EHR 数据集。我们进行了定量和定性实验来衡量学习嵌入的有用性。实验结果表明,即使在语料库中罕见或未观察到,definition2vec 也能将语义相似的医学术语保留在嵌入向量空间中。我们还证明了学习到的向量嵌入在下游医学信息学应用中很有帮助。

结论

本文表明,当从医学笔记等专门文档的小语料库中学习罕见或以前未见过的医学术语的嵌入时,医学术语定义可能会有所帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/118828776be6/12911_2022_1850_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验