从异构电子健康记录数据中学习概率性表型。

Learning probabilistic phenotypes from heterogeneous EHR data.

作者信息

Pivovarov Rimma, Perotte Adler J, Grave Edouard, Angiolillo John, Wiggins Chris H, Elhadad Noémie

机构信息

Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.

College of Physicians and Surgeons, Columbia University, New York, NY, USA.

出版信息

J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.

DOI:10.1016/j.jbi.2015.10.001

PMID:26464024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8025140/

Abstract

We present the Unsupervised Phenome Model (UPhenome), a probabilistic graphical model for large-scale discovery of computational models of disease, or phenotypes. We tackle this challenge through the joint modeling of a large set of diseases and a large set of clinical observations. The observations are drawn directly from heterogeneous patient record data (notes, laboratory tests, medications, and diagnosis codes), and the diseases are modeled in an unsupervised fashion. We apply UPhenome to two qualitatively different mixtures of patients and diseases: records of extremely sick patients in the intensive care unit with constant monitoring, and records of outpatients regularly followed by care providers over multiple years. We demonstrate that the UPhenome model can learn from these different care settings, without any additional adaptation. Our experiments show that (i) the learned phenotypes combine the heterogeneous data types more coherently than baseline LDA-based phenotypes; (ii) they each represent single diseases rather than a mix of diseases more often than the baseline ones; and (iii) when applied to unseen patient records, they are correlated with the patients' ground-truth disorders. Code for training, inference, and quantitative evaluation is made available to the research community.

摘要

我们提出了无监督表型模型（UPhenome），这是一种用于大规模发现疾病或表型计算模型的概率图模型。我们通过对大量疾病和大量临床观察结果进行联合建模来应对这一挑战。观察结果直接来自异质的患者记录数据（病历、实验室检查、用药情况和诊断代码），并且以无监督方式对疾病进行建模。我们将UPhenome应用于两种性质不同的患者与疾病组合：重症监护病房中病情极其严重且持续监测的患者记录，以及多年来由医护人员定期跟踪的门诊患者记录。我们证明，UPhenome模型可以从这些不同的护理环境中学习，而无需任何额外的调整。我们的实验表明：（i）与基于潜在狄利克雷分配（LDA）的基线表型相比，所学习到的表型能更连贯地整合异质数据类型；（ii）与基线表型相比，它们更常各自代表单一疾病而非多种疾病的混合；（iii）当应用于未见过的患者记录时，它们与患者的真实疾病相关。训练、推理和定量评估的代码已提供给研究界。

相似文献

Learning probabilistic phenotypes from heterogeneous EHR data.

J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.

Automated feature selection of predictors in electronic medical records data.

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

J Am Med Inform Assoc. 2018 Oct 1;25(10):1359-1365. doi: 10.1093/jamia/ocy056.

Unsupervised probabilistic models for sequential Electronic Health Records.

J Biomed Inform. 2022 Oct;134:104163. doi: 10.1016/j.jbi.2022.104163. Epub 2022 Aug 28.

MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record.

J Biomed Inform. 2022 Oct;134:104190. doi: 10.1016/j.jbi.2022.104190. Epub 2022 Sep 1.

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.

BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):123. doi: 10.1186/s12911-018-0672-0.

Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records.

J Biomed Inform. 2020 Feb;102:103364. doi: 10.1016/j.jbi.2019.103364. Epub 2019 Dec 28.

A methodology of phenotyping ICU patients from EHR data: High-fidelity, personalized, and interpretable phenotypes estimation.

J Biomed Inform. 2023 Dec;148:104547. doi: 10.1016/j.jbi.2023.104547. Epub 2023 Nov 18.

Generative transfer learning for measuring plausibility of EHR diagnosis records.

J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.

EHR-based phenotyping: Bulk learning and evaluation.

J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.

引用本文的文献

Clinical Research Informatics: a Decade-in-Review.

Yearb Med Inform. 2024 Aug;33(1):127-142. doi: 10.1055/s-0044-1800732. Epub 2025 Apr 8.

Finding Long-COVID: temporal topic modeling of electronic health records from the N3C and RECOVER programs.

NPJ Digit Med. 2024 Oct 21;7(1):296. doi: 10.1038/s41746-024-01286-3.

Finding Long-COVID: Temporal Topic Modeling of Electronic Health Records from the N3C and RECOVER Programs.

medRxiv. 2024 Jun 11:2023.09.11.23295259. doi: 10.1101/2023.09.11.23295259.

Integrating Structured and Unstructured EHR Data for Predicting Mortality by Machine Learning and Latent Dirichlet Allocation Method.

Int J Environ Res Public Health. 2023 Feb 28;20(5):4340. doi: 10.3390/ijerph20054340.

Multimodal machine learning in precision health: A scoping review.

NPJ Digit Med. 2022 Nov 7;5(1):171. doi: 10.1038/s41746-022-00712-8.

Real-world data: a brief review of the methods, applications, challenges and opportunities.

BMC Med Res Methodol. 2022 Nov 5;22(1):287. doi: 10.1186/s12874-022-01768-6.

A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record.

Sci Rep. 2022 Oct 22;12(1):17737. doi: 10.1038/s41598-022-22585-3.

Unsupervised probabilistic models for sequential Electronic Health Records.

J Biomed Inform. 2022 Oct;134:104163. doi: 10.1016/j.jbi.2022.104163. Epub 2022 Aug 28.

Genetic heterogeneity: Challenges, impacts, and methods through an associative lens.

Genet Epidemiol. 2022 Dec;46(8):555-571. doi: 10.1002/gepi.22497. Epub 2022 Aug 4.

PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care.

BMC Med Inform Decis Mak. 2022 Jul 28;22(Suppl 2):198. doi: 10.1186/s12911-022-01927-1.

本文引用的文献

Using Anchors to Estimate Clinical State without Labeled Data.

AMIA Annu Symp Proc. 2014 Nov 14;2014:606-15. eCollection 2014.

Extracting research-quality phenotypes from electronic health records to support precision medicine.

Genome Med. 2015 Apr 30;7(1):41. doi: 10.1186/s13073-015-0166-y. eCollection 2015.

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

J Am Med Inform Assoc. 2015 Sep;22(5):993-1000. doi: 10.1093/jamia/ocv034. Epub 2015 Apr 29.

Development of phenotype algorithms using electronic medical records and incorporating natural language processing.

BMJ. 2015 Apr 24;350:h1885. doi: 10.1136/bmj.h1885.

Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis.

J Am Med Inform Assoc. 2015 Jul;22(4):872-80. doi: 10.1093/jamia/ocv024. Epub 2015 Apr 20.

Building bridges across electronic health record systems through inferred phenotypic topics.

J Biomed Inform. 2015 Jun;55:82-93. doi: 10.1016/j.jbi.2015.03.011. Epub 2015 Apr 1.

Unfolding Physiological State: Mortality Modelling in Intensive Care Units.

KDD. 2014 Aug 24;2014:75-84. doi: 10.1145/2623330.2623742.

Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.

J Am Med Inform Assoc. 2015 Jan;22(1):143-54. doi: 10.1136/amiajnl-2013-002544. Epub 2014 Aug 21.

Identifying and mitigating biases in EHR laboratory tests.

J Biomed Inform. 2014 Oct;51:24-34. doi: 10.1016/j.jbi.2014.03.016. Epub 2014 Apr 13.

Redundancy-aware topic modeling for patient record notes.

PLoS One. 2014 Feb 13;9(2):e87555. doi: 10.1371/journal.pone.0087555. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从异构电子健康记录数据中学习概率性表型。

Learning probabilistic phenotypes from heterogeneous EHR data.

作者信息

Pivovarov Rimma, Perotte Adler J, Grave Edouard, Angiolillo John, Wiggins Chris H, Elhadad Noémie

机构信息

Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.

College of Physicians and Surgeons, Columbia University, New York, NY, USA.

出版信息

J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.

DOI:10.1016/j.jbi.2015.10.001

PMID:26464024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8025140/

Abstract

摘要

从异构电子健康记录数据中学习概率性表型。

Learning probabilistic phenotypes from heterogeneous EHR data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

从异构电子健康记录数据中学习概率性表型。

Learning probabilistic phenotypes from heterogeneous EHR data.

作者信息

机构信息

出版信息