School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, H3A0E9, Canada.
Department of Physiology and Biomedical Engineering and Division of Gastroenterology and Hepatology, Department of Medicine, and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA.
Nat Commun. 2020 May 21;11(1):2536. doi: 10.1038/s41467-020-16378-3.
Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics.
电子健康记录 (EHR) 是患者健康信息的丰富异构集合,其广泛采用为临床医生和研究人员提供了前所未有的健康信息学、疾病风险预测、可操作的临床建议和精准医学机会。然而,EHR 提出了一些建模挑战,包括高度稀疏的数据矩阵、嘈杂不规则的临床笔记、计费代码分配中的任意偏差、诊断驱动的实验室测试以及异构数据类型。为了解决这些挑战,我们提出了 MixEHR,一种多视图贝叶斯主题模型。我们在 MIMIC-III、Mayo 诊所双相情感障碍和魁北克先天性心脏病 EHR 数据集上展示了 MixEHR。从定性的角度来看,MixEHR 疾病主题揭示了跨异构数据类型的临床特征的有意义组合。从定量的角度来看,我们观察到诊断代码和实验室测试推断的预测准确性优于最先进的方法。我们利用推断出的患者主题混合物对目标疾病进行分类,并预测危急情况下患者的死亡率。在所有比较中,MixEHR 都具有竞争力的表现,并揭示了有意义的与疾病相关的主题。