• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用 UMLS 语义学术语表改进医学术语嵌入。

Improving medical term embeddings using UMLS Metathesaurus.

机构信息

Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.

出版信息

BMC Med Inform Decis Mak. 2022 Apr 29;22(1):114. doi: 10.1186/s12911-022-01850-5.

DOI:10.1186/s12911-022-01850-5
PMID:35488252
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9052653/
Abstract

BACKGROUND

Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small.

METHODS

In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus.

RESULTS

To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications.

CONCLUSION

This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.

摘要

背景

医疗服务提供者创建电子健康记录 (EHR) 来描述用于治疗患者的情况和程序。医务人员以自由文本形式输入的医学笔记是 EHR 中特别有见地的组成部分。在许多医学信息学应用中,应用机器学习工具处理医学笔记具有很大的兴趣。学习笔记中术语的向量表示或嵌入是此类应用中的重要预处理步骤。然而,学习良好的嵌入是具有挑战性的,因为医学笔记中富含专业术语,并且在实际应用中可用的 EHR 数量通常非常少。

方法

在本文中,我们提出了一种从有限数量的医学笔记中学习医学术语嵌入的新算法。该算法称为 definition2vec,它利用术语定义的外部信息。它是 skip-gram 算法的扩展,该算法将统一医学语言系统 (UMLS) Metathesaurus 提供的医学术语的文本定义纳入其中。

结果

为了评估所提出的方法,我们使用了公开的医疗信息监护 (MIMIC-III) EHR 数据集。我们进行了定量和定性实验来衡量学习嵌入的有用性。实验结果表明,即使在语料库中罕见或未观察到,definition2vec 也能将语义相似的医学术语保留在嵌入向量空间中。我们还证明了学习到的向量嵌入在下游医学信息学应用中很有帮助。

结论

本文表明,当从医学笔记等专门文档的小语料库中学习罕见或以前未见过的医学术语的嵌入时,医学术语定义可能会有所帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/6ec83a7e9aa8/12911_2022_1850_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/118828776be6/12911_2022_1850_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/a935db3decd9/12911_2022_1850_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/6ec83a7e9aa8/12911_2022_1850_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/118828776be6/12911_2022_1850_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/a935db3decd9/12911_2022_1850_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8c3/9052653/6ec83a7e9aa8/12911_2022_1850_Fig3_HTML.jpg

相似文献

1
Improving medical term embeddings using UMLS Metathesaurus.利用 UMLS 语义学术语表改进医学术语嵌入。
BMC Med Inform Decis Mak. 2022 Apr 29;22(1):114. doi: 10.1186/s12911-022-01850-5.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.使用机器学习和临床记录预测危重症糖尿病患者的死亡率。
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.
4
The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts.学习统一医学语言系统知识嵌入对生物医学文本中关系抽取的影响。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1556-1567. doi: 10.1093/jamia/ocaa205.
5
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。
BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.
6
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.
7
Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.临床信息学中“现成可用”信息提取算法的应用:意大利医学记录的MetaMap注释可行性研究。
J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.
8
A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews.一种将电子健康记录笔记中的医学术语与通俗定义相链接的自然语言处理系统:利用医生评审进行系统开发。
J Med Internet Res. 2018 Jan 22;20(1):e26. doi: 10.2196/jmir.8669.
9
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
10
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.临床记录中统一医学语言系统术语的出现:大规模语料库分析。
J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.

引用本文的文献

1
Google trend analysis of the Indian population reveals a panel of seasonally sensitive comorbid symptoms with implications for monitoring the seasonally sensitive human population.对印度人口的谷歌趋势分析揭示了一组季节性敏感的共病症状,这对监测季节性敏感人群具有重要意义。
Popul Health Metr. 2024 Dec 30;22(1):40. doi: 10.1186/s12963-024-00349-7.
2
Identifying Medical Concepts and Semantic Types in Lay Vocabularies of Health Consumers Who are Concerned with Diabetes on Social Media Using the UMLS and NLP.使用统一医学语言系统(UMLS)和自然语言处理(NLP)在社交媒体上关注糖尿病的健康消费者的日常词汇中识别医学概念和语义类型。
Proc COMPSAC. 2024 Jul;2024:862-869. doi: 10.1109/compsac61105.2024.00119. Epub 2024 Aug 26.
3

本文引用的文献

1
Does the magic of BERT apply to medical code assignment? A quantitative study.BERT 的魔力是否适用于医疗编码分配?一项定量研究。
Comput Biol Med. 2021 Dec;139:104998. doi: 10.1016/j.compbiomed.2021.104998. Epub 2021 Oct 30.
2
A survey of word embeddings for clinical text.临床文本词嵌入研究
J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.
3
Medical Information Extraction in the Age of Deep Learning.深度学习时代的医学信息抽取。
CoRTEx: contrastive learning for representing terms via explanations with applications on constructing biomedical knowledge graphs.
CoRTEx:通过解释进行术语表示的对比学习及其在构建生物医学知识图谱中的应用。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1912-1920. doi: 10.1093/jamia/ocae115.
4
Advanced Data Processing of Pancreatic Cancer Data Integrating Ontologies and Machine Learning Techniques to Create Holistic Health Records.胰腺癌细胞数据的高级数据处理,通过整合本体论和机器学习技术,创建整体健康记录。
Sensors (Basel). 2024 Mar 7;24(6):1739. doi: 10.3390/s24061739.
5
MedCV: An Interactive Visualization System for Patient Cohort Identification from Medical Claim Data.MedCV:一个用于从医疗理赔数据中识别患者队列的交互式可视化系统。
Proc ACM Int Conf Inf Knowl Manag. 2022 Oct;2022:4828-4832. doi: 10.1145/3511808.3557157. Epub 2022 Nov 4.
Yearb Med Inform. 2020 Aug;29(1):208-220. doi: 10.1055/s-0040-1702001. Epub 2020 Aug 21.
4
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。
Pac Symp Biocomput. 2020;25:295-306.
5
SECNLP: A survey of embeddings in clinical natural language processing.SECNLP:临床自然语言处理中的嵌入技术综述。
J Biomed Inform. 2020 Jan;101:103323. doi: 10.1016/j.jbi.2019.103323. Epub 2019 Nov 8.
6
Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods.基于集成深度学习方法的电子健康记录中的药物不良反应和药物关系提取。
J Am Med Inform Assoc. 2020 Jan 1;27(1):39-46. doi: 10.1093/jamia/ocz101.
7
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.
8
Adversarial Learning of Knowledge Embeddings for the Unified Medical Language System.用于统一医学语言系统的知识嵌入对抗学习
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:543-552. eCollection 2019.
9
BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec,利用子词信息和 MeSH 改进生物医学词向量。
Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.
10
EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.通过将医疗概念和词汇联合嵌入到统一的向量空间中进行 EHR 表型分析。
BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):123. doi: 10.1186/s12911-018-0672-0.