Pivovarov Rimma, Perotte Adler J, Grave Edouard, Angiolillo John, Wiggins Chris H, Elhadad Noémie
Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
College of Physicians and Surgeons, Columbia University, New York, NY, USA.
J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.
We present the Unsupervised Phenome Model (UPhenome), a probabilistic graphical model for large-scale discovery of computational models of disease, or phenotypes. We tackle this challenge through the joint modeling of a large set of diseases and a large set of clinical observations. The observations are drawn directly from heterogeneous patient record data (notes, laboratory tests, medications, and diagnosis codes), and the diseases are modeled in an unsupervised fashion. We apply UPhenome to two qualitatively different mixtures of patients and diseases: records of extremely sick patients in the intensive care unit with constant monitoring, and records of outpatients regularly followed by care providers over multiple years. We demonstrate that the UPhenome model can learn from these different care settings, without any additional adaptation. Our experiments show that (i) the learned phenotypes combine the heterogeneous data types more coherently than baseline LDA-based phenotypes; (ii) they each represent single diseases rather than a mix of diseases more often than the baseline ones; and (iii) when applied to unseen patient records, they are correlated with the patients' ground-truth disorders. Code for training, inference, and quantitative evaluation is made available to the research community.
我们提出了无监督表型模型(UPhenome),这是一种用于大规模发现疾病或表型计算模型的概率图模型。我们通过对大量疾病和大量临床观察结果进行联合建模来应对这一挑战。观察结果直接来自异质的患者记录数据(病历、实验室检查、用药情况和诊断代码),并且以无监督方式对疾病进行建模。我们将UPhenome应用于两种性质不同的患者与疾病组合:重症监护病房中病情极其严重且持续监测的患者记录,以及多年来由医护人员定期跟踪的门诊患者记录。我们证明,UPhenome模型可以从这些不同的护理环境中学习,而无需任何额外的调整。我们的实验表明:(i)与基于潜在狄利克雷分配(LDA)的基线表型相比,所学习到的表型能更连贯地整合异质数据类型;(ii)与基线表型相比,它们更常各自代表单一疾病而非多种疾病的混合;(iii)当应用于未见过的患者记录时,它们与患者的真实疾病相关。训练、推理和定量评估的代码已提供给研究界。