Meng Rui, Soper Braden, Lee Herbert K H, Liu Vincent X, Greene John D, Ray Priyadip
Department of Statistics, University of California, Santa Cruz, CA, United States.
Lawrence Livermore National Laboratory, Livermore, CA, United States.
J Biomed Inform. 2021 May;117:103698. doi: 10.1016/j.jbi.2021.103698. Epub 2021 Feb 19.
Advances in the modeling and analysis of electronic health records (EHR) have the potential to improve patient risk stratification, leading to better patient outcomes. The modeling of complex temporal relations across the multiple clinical variables inherent in EHR data is largely unexplored. Existing approaches to modeling EHR data often lack the flexibility to handle time-varying correlations across multiple clinical variables, or they are too complex for clinical interpretation. Therefore, we propose a novel nonstationary multivariate Gaussian process model for EHR data to address the aforementioned drawbacks of existing methodologies. Our proposed model is able to capture time-varying scale, correlation and smoothness across multiple clinical variables. We also provide details on two inference approaches: Maximum a posteriori and Hamilton Monte Carlo. Our model is validated on synthetic data and then we demonstrate its effectiveness on EHR data from Kaiser Permanente Division of Research (KPDOR). Finally, we use the KPDOR EHR data to investigate the relationships between a clinical patient risk metric and the latent processes of our proposed model and demonstrate statistically significant correlations between these entities.
电子健康记录(EHR)建模与分析方面的进展有潜力改善患者风险分层,从而带来更好的患者治疗效果。EHR数据中多个临床变量间复杂时间关系的建模在很大程度上尚未得到探索。现有EHR数据建模方法往往缺乏处理多个临床变量间随时间变化的相关性的灵活性,或者对于临床解释来说过于复杂。因此,我们提出一种用于EHR数据的新型非平稳多元高斯过程模型,以解决现有方法的上述缺点。我们提出的模型能够捕捉多个临床变量间随时间变化的尺度、相关性和平滑度。我们还详细介绍了两种推理方法:最大后验概率法和哈密顿蒙特卡罗法。我们的模型在合成数据上得到验证,然后在凯撒永久医疗集团研究部(KPDOR)的EHR数据上证明了其有效性。最后,我们使用KPDOR的EHR数据来研究临床患者风险指标与我们提出的模型的潜在过程之间的关系,并证明这些实体之间存在统计学上的显著相关性。