• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用医学概念嵌入中的层次结构。

Exploiting hierarchy in medical concept embedding.

作者信息

Finch Anthony, Crowell Alexander, Bhatia Mamta, Parameshwarappa Pooja, Chang Yung-Chieh, Martinez Jose, Horberg Michael

机构信息

Kaiser Permanente Mid-Atlantic Permanente Medical Group, Rockville, Maryland, USA.

Kaiser Permanente Mid-Atlantic Permanente Research Institute, Rockville, Maryland, USA.

出版信息

JAMIA Open. 2021 Mar 16;4(1):ooab022. doi: 10.1093/jamiaopen/ooab022. eCollection 2021 Jan.

DOI:10.1093/jamiaopen/ooab022
PMID:33748691
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7962787/
Abstract

OBJECTIVE

To construct and publicly release a set of medical concept embeddings for codes following the ICD-10 coding standard which explicitly incorporate hierarchical information from medical codes into the embedding formulation.

MATERIALS AND METHODS

We trained concept embeddings using several new extensions to the Word2Vec algorithm using a dataset of approximately 600,000 patients from a major integrated healthcare organization in the Mid-Atlantic US. Our concept embeddings included additional entities to account for the medical categories assigned to codes by the Clinical Classification Software Revised (CCSR) dataset. We compare these results to sets of publicly released pretrained embeddings and alternative training methodologies.

RESULTS

We found that Word2Vec models which included hierarchical data outperformed ordinary Word2Vec alternatives on tasks which compared naïve clusters to canonical ones provided by CCSR. Our Skip-Gram model with both codes and categories achieved 61.4% normalized mutual information with canonical labels in comparison to 57.5% with traditional Skip-Gram. In models operating on two different outcomes, we found that including hierarchical embedding data improved classification performance 96.2% of the time. When controlling for all other variables, we found that co-training embeddings improved classification performance 66.7% of the time. We found that all models outperformed our competitive benchmarks.

DISCUSSION

We found significant evidence that our proposed algorithms can express the hierarchical structure of medical codes more fully than ordinary Word2Vec models, and that this improvement carries forward into classification tasks. As part of this publication, we have released several sets of pretrained medical concept embeddings using the ICD-10 standard which significantly outperform other well-known pretrained vectors on our tested outcomes.

摘要

目的

构建并公开发布一组遵循ICD - 10编码标准的医学概念嵌入,将医学编码中的层次信息明确纳入嵌入公式。

材料与方法

我们使用来自美国大西洋中部一家大型综合医疗保健机构的约600,000名患者的数据集,通过对Word2Vec算法的几个新扩展来训练概念嵌入。我们的概念嵌入包括额外的实体,以考虑临床分类软件修订版(CCSR)数据集分配给编码的医学类别。我们将这些结果与公开发布的预训练嵌入集和替代训练方法进行比较。

结果

我们发现,在将朴素聚类与CCSR提供的标准聚类进行比较的任务中,包含层次数据的Word2Vec模型优于普通的Word2Vec替代模型。我们的带有编码和类别的Skip - Gram模型与标准标签的归一化互信息达到61.4%,而传统Skip - Gram模型为57.5%。在处理两种不同结果的模型中,我们发现包含层次嵌入数据在96.2%的情况下提高了分类性能。在控制所有其他变量时,我们发现共同训练嵌入在66.7%的情况下提高了分类性能。我们发现所有模型都优于我们的竞争基准。

讨论

我们发现有重要证据表明,我们提出的算法比普通Word2Vec模型能更充分地表达医学编码的层次结构,并且这种改进在分类任务中得以延续。作为本出版物的一部分,我们使用ICD - 10标准发布了几组预训练的医学概念嵌入,在我们测试的结果上显著优于其他知名的预训练向量。

相似文献

1
Exploiting hierarchy in medical concept embedding.利用医学概念嵌入中的层次结构。
JAMIA Open. 2021 Mar 16;4(1):ooab022. doi: 10.1093/jamiaopen/ooab022. eCollection 2021 Jan.
2
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.用于对ICD-10-CM编码进行分类的混合采样训练投影词嵌入模型:纵向观察研究
JMIR Med Inform. 2019 Jul 23;7(3):e14499. doi: 10.2196/14499.
3
Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.多视图不完整知识图集成及其在跨机构电子健康记录数据协调中的应用。
J Biomed Inform. 2022 Sep;133:104147. doi: 10.1016/j.jbi.2022.104147. Epub 2022 Jul 21.
4
Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.自动国际疾病分类编码系统:基于规则方法的深度情境化语言模型
JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.
5
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.
6
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
7
Time-sensitive clinical concept embeddings learned from large electronic health records.从大型电子健康记录中学习的时间敏感型临床概念嵌入。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):58. doi: 10.1186/s12911-019-0766-3.
8
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。
J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.
9
Optimizing word embeddings for small dataset: a case study on patient portal messages from breast cancer patients.优化小数据集的词向量:以乳腺癌患者的患者门户消息为例的研究。
Sci Rep. 2024 Jul 12;14(1):16117. doi: 10.1038/s41598-024-66319-z.
10
Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples.针对少量样本的患者门户消息数据集优化词嵌入
Res Sq. 2024 May 15:rs.3.rs-4350387. doi: 10.21203/rs.3.rs-4350387/v1.

引用本文的文献

1
Multi-Modal Fusion of Routine Care Electronic Health Records (EHR): A Scoping Review.常规护理电子健康记录(EHR)的多模态融合:一项范围综述
Information (Basel). 2025 Jan;16(1). doi: 10.3390/info16010054. Epub 2025 Jan 15.
2
Large language models improve transferability of electronic health record-based predictions across countries and coding systems.大型语言模型提高了基于电子健康记录的预测在不同国家和编码系统之间的可转移性。
medRxiv. 2025 Feb 4:2025.02.03.25321597. doi: 10.1101/2025.02.03.25321597.
3
Unified Clinical Vocabulary Embeddings for Advancing Precision Medicine.用于推进精准医学的统一临床词汇嵌入
medRxiv. 2024 Dec 10:2024.12.03.24318322. doi: 10.1101/2024.12.03.24318322.
4
A novel, machine-learning model for prediction of short-term ASCVD risk over 90 and 365 days.一种用于预测90天和365天短期动脉粥样硬化性心血管疾病(ASCVD)风险的新型机器学习模型。
Front Digit Health. 2024 Nov 1;6:1485508. doi: 10.3389/fdgth.2024.1485508. eCollection 2024.
5
Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification.纵向多模态变压器集成来自常规电子健康记录的成像和潜在临床特征用于肺结节分类
Med Image Comput Comput Assist Interv. 2023 Oct;14221:649-659. doi: 10.1007/978-3-031-43895-0_61. Epub 2023 Oct 1.
6
Hypergraph Transformers for EHR-based Clinical Predictions.用于基于电子健康记录的临床预测的超图变换器
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:582-591. eCollection 2023.
7
A comparison of attentional neural network architectures for modeling with electronic medical records.用于电子病历建模的注意力神经网络架构比较
JAMIA Open. 2021 Aug 12;4(3):ooab064. doi: 10.1093/jamiaopen/ooab064. eCollection 2021 Jul.

本文引用的文献

1
Time-sensitive clinical concept embeddings learned from large electronic health records.从大型电子健康记录中学习的时间敏感型临床概念嵌入。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):58. doi: 10.1186/s12911-019-0766-3.
2
Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes.利用庞加莱嵌入学习医学概念的上下文层次结构以阐明表型。
Pac Symp Biocomput. 2019;24:8-17.
3
Semi-supervised learning of the electronic health record for phenotype stratification.用于表型分层的电子健康记录的半监督学习
J Biomed Inform. 2016 Dec;64:168-178. doi: 10.1016/j.jbi.2016.10.007. Epub 2016 Oct 12.