• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DOME:来自电子健康记录的定向医学嵌入向量。

DOME: Directional medical embedding vectors from Electronic Health Records.

作者信息

Wen Jun, Xue Hao, Rush Everett, Panickan Vidul A, Cai Tianrun, Zhou Doudou, Ho Yuk-Lam, Costa Lauren, Begoli Edmon, Hong Chuan, Gaziano J Michael, Cho Kelly, Liao Katherine P, Lu Junwei, Cai Tianxi

机构信息

Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA.

Department of Computational Biology, Cornell University, Ithaca, NY, USA.

出版信息

J Biomed Inform. 2025 Feb;162:104768. doi: 10.1016/j.jbi.2024.104768. Epub 2025 Jan 2.

DOI:10.1016/j.jbi.2024.104768
PMID:39755324
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12040072/
Abstract

MOTIVATION

The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data. On the other hand, scalable approaches that only require summary-level data do not incorporate temporal dependencies between concepts.

METHODS

We introduce a DirectiOnal Medical Embedding (DOME) algorithm to encode temporally directional relationships between medical concepts, using summary-level EHR data. Specifically, DOME first aggregates patient-level EHR data into an asymmetric co-occurrence matrix. Then it computes two Positive Pointwise Mutual Information (PPMI) matrices to correspondingly encode the pairwise prior and posterior dependencies between medical concepts. Following that, a joint matrix factorization is performed on the two PPMI matrices, which results in three vectors for each concept: a semantic embedding and two directional context embeddings. They collectively provide a comprehensive depiction of the temporal relationship between EHR concepts.

RESULTS

We highlight the advantages and translational potential of DOME through three sets of validation studies. First, DOME consistently improves existing direction-agnostic embedding vectors for disease risk prediction in several diseases, for example achieving a relative gain of 5.5% in the area under the receiver operating characteristic (AUROC) for lung cancer. Second, DOME excels in directional drug-disease relationship inference by successfully differentiating between drug side effects and indications, correspondingly achieving relative AUROC gain over the state-of-the-art methods by 10.8% and 6.6%. Finally, DOME effectively constructs directional knowledge graphs, which distinguish disease risk factors from comorbidities, thereby revealing disease progression trajectories. The source codes are provided at https://github.com/celehs/Directional-EHR-embedding.

摘要

动机

电子健康记录(EHR)系统可用性的不断提高为转化研究创造了巨大潜力。表示学习技术的最新发展带来了EHR概念的有效大规模表示以及赋能下游EHR研究的知识图谱。然而,大多数现有方法需要使用患者级数据进行训练,限制了它们利用多机构EHR数据扩展训练的能力。另一方面,仅需要汇总级数据的可扩展方法没有纳入概念之间的时间依赖性。

方法

我们引入了一种定向医学嵌入(DOME)算法,使用汇总级EHR数据对医学概念之间的时间定向关系进行编码。具体而言,DOME首先将患者级EHR数据聚合为一个不对称共现矩阵。然后计算两个正点互信息(PPMI)矩阵,以相应地编码医学概念之间的成对先验和后验依赖性。随后,对这两个PPMI矩阵进行联合矩阵分解,这为每个概念生成三个向量:一个语义嵌入和两个定向上下文嵌入。它们共同提供了EHR概念之间时间关系的全面描述。

结果

我们通过三组验证研究突出了DOME的优势和转化潜力。首先,DOME在几种疾病的疾病风险预测中持续改进现有的与方向无关的嵌入向量,例如在肺癌的受试者操作特征曲线下面积(AUROC)方面实现了5.5%的相对增益。其次,DOME在定向药物-疾病关系推理方面表现出色,通过成功区分药物副作用和适应症,相对于现有方法,AUROC相对增益分别达到10.8%和6.6%。最后,DOME有效地构建了定向知识图谱,将疾病风险因素与合并症区分开来,从而揭示疾病进展轨迹。源代码可在https://github.com/celehs/Directional-EHR-embedding获取。

相似文献

1
DOME: Directional medical embedding vectors from Electronic Health Records.DOME:来自电子健康记录的定向医学嵌入向量。
J Biomed Inform. 2025 Feb;162:104768. doi: 10.1016/j.jbi.2024.104768. Epub 2025 Jan 2.
2
ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis.ARCH:通过聚合叙事编码健康记录分析构建大规模知识图谱
medRxiv. 2023 May 21:2023.05.14.23289955. doi: 10.1101/2023.05.14.23289955.
3
Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.多视图不完整知识图集成及其在跨机构电子健康记录数据协调中的应用。
J Biomed Inform. 2022 Sep;133:104147. doi: 10.1016/j.jbi.2022.104147. Epub 2022 Jul 21.
4
ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.ARCH:通过汇总叙述性编码健康记录分析构建大规模知识图谱
J Biomed Inform. 2025 Feb;162:104761. doi: 10.1016/j.jbi.2024.104761. Epub 2025 Jan 23.
5
Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study.基于自监督方法的疾病概念嵌入在电子健康记录中的医学信息提取和疾病检索:算法开发和验证研究。
J Med Internet Res. 2021 Jan 27;23(1):e25113. doi: 10.2196/25113.
6
Time-sensitive clinical concept embeddings learned from large electronic health records.从大型电子健康记录中学习的时间敏感型临床概念嵌入。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):58. doi: 10.1186/s12911-019-0766-3.
7
Multimodal representation learning for predicting molecule-disease relations.基于多模态表示学习的药物-疾病关系预测
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad085.
8
Tensor learning of pointwise mutual information from EHR data for early prediction of sepsis.基于电子健康记录数据的点互信息张量学习用于脓毒症的早期预测。
Comput Biol Med. 2021 Jul;134:104430. doi: 10.1016/j.compbiomed.2021.104430. Epub 2021 May 7.
9
Prediction task guided representation learning of medical codes in EHR.基于预测任务的电子健康记录中医疗编码的表示学习。
J Biomed Inform. 2018 Aug;84:1-10. doi: 10.1016/j.jbi.2018.06.013. Epub 2018 Jun 19.
10
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+:利用异构知识资源丰富人类表型本体的节点嵌入。
J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

本文引用的文献

1
ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.ARCH:通过汇总叙述性编码健康记录分析构建大规模知识图谱
J Biomed Inform. 2025 Feb;162:104761. doi: 10.1016/j.jbi.2024.104761. Epub 2025 Jan 23.
2
LATTE: Label-efficient incident phenotyping from longitudinal electronic health records.LATTE:从纵向电子健康记录中进行高效标签事件表型分析。
Patterns (N Y). 2023 Dec 27;5(1):100906. doi: 10.1016/j.patter.2023.100906. eCollection 2024 Jan 12.
3
TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.TransformEHR:基于转换器的编解码器生成模型,用于使用电子健康记录增强疾病结局预测。
Nat Commun. 2023 Nov 29;14(1):7857. doi: 10.1038/s41467-023-43715-z.
4
Granger Causality: A Review and Recent Advances.格兰杰因果关系:综述与最新进展
Annu Rev Stat Appl. 2022 Mar;9(1):289-319. doi: 10.1146/annurev-statistics-040120-010930. Epub 2021 Nov 17.
5
Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies.为真实世界证据生成可分析数据:利用先进信息学技术驾驭电子健康记录的教程。
J Med Internet Res. 2023 May 25;25:e45662. doi: 10.2196/45662.
6
Hypertension as Cardiovascular Risk Factor in Chronic Kidney Disease.高血压作为慢性肾脏病的心血管危险因素。
Circ Res. 2023 Apr 14;132(8):1050-1063. doi: 10.1161/CIRCRESAHA.122.321762. Epub 2023 Apr 13.
7
COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter.COVID-Twitter-BERT:一种用于分析推特上新冠疫情相关内容的自然语言处理模型。
Front Artif Intell. 2023 Mar 14;6:1023281. doi: 10.3389/frai.2023.1023281. eCollection 2023.
8
EHR foundation models improve robustness in the presence of temporal distribution shift.电子健康记录基础模型可提高在时间分布偏移情况下的稳健性。
Sci Rep. 2023 Mar 7;13(1):3767. doi: 10.1038/s41598-023-30820-8.
9
Multimodal representation learning for predicting molecule-disease relations.基于多模态表示学习的药物-疾病关系预测
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad085.
10
Building a knowledge graph to enable precision medicine.构建知识图谱以实现精准医学。
Sci Data. 2023 Feb 2;10(1):67. doi: 10.1038/s41597-023-01960-3.