Department of Internal Medicine, University of New Mexico School of Medicine, MSC10 5550 1 University of New Mexico Albuquerque, Albuquerque, NM, 87131, USA.
Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine MSC08 4670 1 University of New Mexico Albuquerque, Albuquerque, NM, 87131, USA.
BMC Med Res Methodol. 2021 Jul 24;21(1):151. doi: 10.1186/s12874-021-01318-6.
Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality.
We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC).
Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4-30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD.
We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic.
将电子健康记录 (EHR) 条目转换为有用的临床推论,需要解决现有广义线性混合模型 (GLMM) 重复测量实施中存在的可扩展性问题。主要的计算瓶颈涉及多变量积分的数值评估,即使对于最简单的 EHR 分析,也可能涉及数百万个维度(每个患者一个维度)。GLMM 的层次似然 (h-lik) 方法是一种基于拉普拉斯逼近 (LA) 的严格方法学框架,用于估计 GLMM,它用数值优化代替积分,因此与维度很好地扩展。
我们在 R 包 TMB 中提出了 GLMM 的 h-lik 的高性能直接实现。使用这种方法,我们检查了重复血清钾测量与 Cerner 真实世界数据 (CRWD) EHR 数据库中生存的关系。分析此数据需要评估超过 300 万个维度的积分,这使得传统方法无法解决此问题。我们还评估了 LA 在更小的 1%和 10%全数据集样本中的可扩展性和准确性,这些样本是通过以下方法分析的:a)原始的、相互连接的广义线性模型 (iGLM) 方法,b)自适应高斯赫尔米特 (AGH) 和 c)多变量积分马尔可夫链蒙特卡罗 (MCMC) 的金标准。
LA 生成的随机效应估计值与 iGLM、AGH 和 MCMC 技术获得的值相差在 10%以内。H-lik 方法比 AGH 快 4-30 倍,比 MCMC 快近 800 倍。该问题的主要临床推论是建立了钾水平与死亡率风险之间的非线性关系,以及在 CRWD 中死亡率风险的个体和医疗保健设施来源的变异估计。
我们发现,通过 h-lik 方法直接实现 h-lik 为分析通过 h-lik 方法到 GLMM 的极其庞大的真实世界重复测量数据提供了一种计算效率高、数值准确的方法。我们分析的临床推论可能会指导诊所中治疗钾紊乱的治疗阈值的选择。