Suppr超能文献

DOME:来自电子健康记录的定向医学嵌入向量。

DOME: Directional medical embedding vectors from Electronic Health Records.

作者信息

Wen Jun, Xue Hao, Rush Everett, Panickan Vidul A, Cai Tianrun, Zhou Doudou, Ho Yuk-Lam, Costa Lauren, Begoli Edmon, Hong Chuan, Gaziano J Michael, Cho Kelly, Liao Katherine P, Lu Junwei, Cai Tianxi

机构信息

Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA.

Department of Computational Biology, Cornell University, Ithaca, NY, USA.

出版信息

J Biomed Inform. 2025 Feb;162:104768. doi: 10.1016/j.jbi.2024.104768. Epub 2025 Jan 2.

Abstract

MOTIVATION

The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data. On the other hand, scalable approaches that only require summary-level data do not incorporate temporal dependencies between concepts.

METHODS

We introduce a DirectiOnal Medical Embedding (DOME) algorithm to encode temporally directional relationships between medical concepts, using summary-level EHR data. Specifically, DOME first aggregates patient-level EHR data into an asymmetric co-occurrence matrix. Then it computes two Positive Pointwise Mutual Information (PPMI) matrices to correspondingly encode the pairwise prior and posterior dependencies between medical concepts. Following that, a joint matrix factorization is performed on the two PPMI matrices, which results in three vectors for each concept: a semantic embedding and two directional context embeddings. They collectively provide a comprehensive depiction of the temporal relationship between EHR concepts.

RESULTS

We highlight the advantages and translational potential of DOME through three sets of validation studies. First, DOME consistently improves existing direction-agnostic embedding vectors for disease risk prediction in several diseases, for example achieving a relative gain of 5.5% in the area under the receiver operating characteristic (AUROC) for lung cancer. Second, DOME excels in directional drug-disease relationship inference by successfully differentiating between drug side effects and indications, correspondingly achieving relative AUROC gain over the state-of-the-art methods by 10.8% and 6.6%. Finally, DOME effectively constructs directional knowledge graphs, which distinguish disease risk factors from comorbidities, thereby revealing disease progression trajectories. The source codes are provided at https://github.com/celehs/Directional-EHR-embedding.

摘要

动机

电子健康记录(EHR)系统可用性的不断提高为转化研究创造了巨大潜力。表示学习技术的最新发展带来了EHR概念的有效大规模表示以及赋能下游EHR研究的知识图谱。然而,大多数现有方法需要使用患者级数据进行训练,限制了它们利用多机构EHR数据扩展训练的能力。另一方面,仅需要汇总级数据的可扩展方法没有纳入概念之间的时间依赖性。

方法

我们引入了一种定向医学嵌入(DOME)算法,使用汇总级EHR数据对医学概念之间的时间定向关系进行编码。具体而言,DOME首先将患者级EHR数据聚合为一个不对称共现矩阵。然后计算两个正点互信息(PPMI)矩阵,以相应地编码医学概念之间的成对先验和后验依赖性。随后,对这两个PPMI矩阵进行联合矩阵分解,这为每个概念生成三个向量:一个语义嵌入和两个定向上下文嵌入。它们共同提供了EHR概念之间时间关系的全面描述。

结果

我们通过三组验证研究突出了DOME的优势和转化潜力。首先,DOME在几种疾病的疾病风险预测中持续改进现有的与方向无关的嵌入向量,例如在肺癌的受试者操作特征曲线下面积(AUROC)方面实现了5.5%的相对增益。其次,DOME在定向药物-疾病关系推理方面表现出色,通过成功区分药物副作用和适应症,相对于现有方法,AUROC相对增益分别达到10.8%和6.6%。最后,DOME有效地构建了定向知识图谱,将疾病风险因素与合并症区分开来,从而揭示疾病进展轨迹。源代码可在https://github.com/celehs/Directional-EHR-embedding获取。

相似文献

本文引用的文献

4
Granger Causality: A Review and Recent Advances.格兰杰因果关系:综述与最新进展
Annu Rev Stat Appl. 2022 Mar;9(1):289-319. doi: 10.1146/annurev-statistics-040120-010930. Epub 2021 Nov 17.
6
Hypertension as Cardiovascular Risk Factor in Chronic Kidney Disease.高血压作为慢性肾脏病的心血管危险因素。
Circ Res. 2023 Apr 14;132(8):1050-1063. doi: 10.1161/CIRCRESAHA.122.321762. Epub 2023 Apr 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验