Suppr超能文献

基于电子健康记录的张量分解检测时变表型主题:心血管疾病案例研究。

Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study.

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.

Fixed Income Division, Morgan Stanley & Co LLC, New York, NY, USA.

出版信息

J Biomed Inform. 2019 Oct;98:103270. doi: 10.1016/j.jbi.2019.103270. Epub 2019 Aug 22.

Abstract

OBJECTIVE

Discovering subphenotypes of complex diseases can help characterize disease cohorts for investigative studies aimed at developing better diagnoses and treatments. Recent advances in unsupervised machine learning on electronic health record (EHR) data have enabled researchers to discover phenotypes without input from domain experts. However, most existing studies have ignored time and modeled diseases as discrete events. Uncovering the evolution of phenotypes - how they emerge, evolve and contribute to health outcomes - is essential to define more precise phenotypes and refine the understanding of disease progression. Our objective was to assess the benefits of an unsupervised approach that incorporates time to model diseases as dynamic processes in phenotype discovery.

METHODS

In this study, we applied a constrained non-negative tensor-factorization approach to characterize the complexity of cardiovascular disease (CVD) patient cohort based on longitudinal EHR data. Through tensor-factorization, we identified a set of phenotypic topics (i.e., subphenotypes) that these patients established over the 10 years prior to the diagnosis of CVD, and showed the progress pattern. For each identified subphenotype, we examined its association with the risk for adverse cardiovascular outcomes estimated by the American College of Cardiology/American Heart Association Pooled Cohort Risk Equations, a conventional CVD-risk assessment tool frequently used in clinical practice. Furthermore, we compared the subsequent myocardial infarction (MI) rates among the six most prevalent subphenotypes using survival analysis.

RESULTS

From a cohort of 12,380 adult CVD individuals with 1068 unique PheCodes, we successfully identified 14 subphenotypes. Through the association analysis with estimated CVD risk for each subtype, we found some phenotypic topics such as Vitamin D deficiency and depression, Urinary infections cannot be explained by the conventional risk factors. Through a survival analysis, we found markedly different risks of subsequent MI following the diagnosis of CVD among the six most prevalent topics (p < 0.0001), indicating these topics may capture clinically meaningful subphenotypes of CVD.

CONCLUSION

This study demonstrates the potential benefits of using tensor-decomposition to model diseases as dynamic processes from longitudinal EHR data. Our results suggest that this data-driven approach may potentially help researchers identify complex and chronic disease subphenotypes in precision medicine research.

摘要

目的

发现复杂疾病的亚表型有助于为旨在开发更好的诊断和治疗方法的研究对疾病队列进行特征描述。电子健康记录 (EHR) 数据上无监督机器学习的最新进展使研究人员能够在没有领域专家输入的情况下发现表型。然而,大多数现有研究忽略了时间,并将疾病建模为离散事件。揭示表型的演变——它们是如何出现、演变并影响健康结果的——对于定义更精确的表型和深化对疾病进展的理解至关重要。我们的目标是评估一种无监督方法的益处,该方法将时间纳入表型发现中,将疾病建模为动态过程。

方法

在这项研究中,我们应用受约束的非负张量分解方法来根据纵向 EHR 数据对心血管疾病 (CVD) 患者队列的复杂性进行特征描述。通过张量分解,我们确定了一组表型主题(即亚表型),这些患者在 CVD 诊断前的 10 年内建立了这些主题,并显示了进展模式。对于每个识别出的亚表型,我们检查了它与通过美国心脏病学会/美国心脏协会 Pooled Cohort Risk Equations 估计的不良心血管结局风险的关联,这是一种在临床实践中经常使用的常规 CVD 风险评估工具。此外,我们使用生存分析比较了六个最常见的亚表型之间的后续心肌梗死 (MI) 发生率。

结果

从 12380 名患有 1068 个独特 PheCodes 的成年 CVD 个体的队列中,我们成功地识别出了 14 个亚表型。通过与每种亚型的估计 CVD 风险的关联分析,我们发现了一些表型主题,例如维生素 D 缺乏症和抑郁症,这些主题不能用传统的风险因素来解释。通过生存分析,我们发现六个最常见主题中 CVD 诊断后发生后续 MI 的风险明显不同(p < 0.0001),这表明这些主题可能捕获了 CVD 的临床有意义的亚表型。

结论

本研究表明,使用张量分解将疾病建模为来自纵向 EHR 数据的动态过程的潜在益处。我们的结果表明,这种数据驱动的方法可能有助于研究人员在精准医学研究中识别复杂和慢性疾病的亚表型。

相似文献

2
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.
J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.
4
Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.
J Biomed Inform. 2019 May;93:103125. doi: 10.1016/j.jbi.2019.103125. Epub 2019 Feb 8.
5
Limestone: high-throughput candidate phenotype generation via tensor factorization.
J Biomed Inform. 2014 Dec;52:199-211. doi: 10.1016/j.jbi.2014.07.001. Epub 2014 Jul 16.
6
Relational machine learning for electronic health record-driven phenotyping.
J Biomed Inform. 2014 Dec;52:260-70. doi: 10.1016/j.jbi.2014.07.007. Epub 2014 Jul 15.
9
Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.
KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.

引用本文的文献

1
Identifying and predicting headache trajectories among those with acute post-traumatic headache.
Headache. 2025 Jul-Aug;65(7):1124-1133. doi: 10.1111/head.14955. Epub 2025 May 30.
2
Identifying progression subphenotypes of Alzheimer's disease from large-scale electronic health records with machine learning.
J Biomed Inform. 2025 May;165:104820. doi: 10.1016/j.jbi.2025.104820. Epub 2025 Apr 1.
5
Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization.
bioRxiv. 2024 Jul 10:2024.07.05.602272. doi: 10.1101/2024.07.05.602272.
6
Soft phenotyping for sepsis via EHR time-aware soft clustering.
J Biomed Inform. 2024 Apr;152:104615. doi: 10.1016/j.jbi.2024.104615. Epub 2024 Feb 27.
7
A methodology of phenotyping ICU patients from EHR data: High-fidelity, personalized, and interpretable phenotypes estimation.
J Biomed Inform. 2023 Dec;148:104547. doi: 10.1016/j.jbi.2023.104547. Epub 2023 Nov 18.
8
Artificial Intelligence-Based Methods for Precision Cardiovascular Medicine.
J Pers Med. 2023 Aug 16;13(8):1268. doi: 10.3390/jpm13081268.
9
Improving Diagnostics with Deep Forest Applied to Electronic Health Records.
Sensors (Basel). 2023 Jul 21;23(14):6571. doi: 10.3390/s23146571.
10
Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.
J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.

本文引用的文献

1
Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.
KDD. 2015 Aug;2015:1265-1274. doi: 10.1145/2783258.2783395.
2
Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.
Annu Rev Biomed Data Sci. 2018 Jul;1:53-68. doi: 10.1146/annurev-biodatasci-080917-013315. Epub 2018 May 23.
5
Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.
J Biomed Inform. 2019 May;93:103125. doi: 10.1016/j.jbi.2019.103125. Epub 2019 Feb 8.
6
Management of Blood Cholesterol.
JAMA. 2019 Feb 26;321(8):800-801. doi: 10.1001/jama.2019.0015.
8
Spark That Lights the Fire: Infection Triggers Cardiovascular Events.
J Am Heart Assoc. 2018 Nov 20;7(22):e011175. doi: 10.1161/JAHA.118.011175.
9
Inpatient and Outpatient Infection as a Trigger of Cardiovascular Disease: The ARIC Study.
J Am Heart Assoc. 2018 Nov 20;7(22):e009683. doi: 10.1161/JAHA.118.009683.
10
Unsupervised Discovery of Demixed, Low-Dimensional Neural Dynamics across Multiple Timescales through Tensor Component Analysis.
Neuron. 2018 Jun 27;98(6):1099-1115.e8. doi: 10.1016/j.neuron.2018.05.015. Epub 2018 Jun 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验