高性能实现广义线性混合模型的层次似然：在大规模电子健康记录数据集估计钾参考范围中的应用。

High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets.

机构信息

Department of Internal Medicine, University of New Mexico School of Medicine, MSC10 5550 1 University of New Mexico Albuquerque, Albuquerque, NM, 87131, USA.

Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine MSC08 4670 1 University of New Mexico Albuquerque, Albuquerque, NM, 87131, USA.

出版信息

BMC Med Res Methodol. 2021 Jul 24;21(1):151. doi: 10.1186/s12874-021-01318-6.

DOI:10.1186/s12874-021-01318-6

PMID:34303362

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8310602/

Abstract

BACKGROUND

Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality.

METHODS

We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC).

RESULTS

Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4-30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD.

CONCLUSIONS

We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic.

摘要

背景

将电子健康记录 (EHR) 条目转换为有用的临床推论，需要解决现有广义线性混合模型 (GLMM) 重复测量实施中存在的可扩展性问题。主要的计算瓶颈涉及多变量积分的数值评估，即使对于最简单的 EHR 分析，也可能涉及数百万个维度（每个患者一个维度）。GLMM 的层次似然 (h-lik) 方法是一种基于拉普拉斯逼近 (LA) 的严格方法学框架，用于估计 GLMM，它用数值优化代替积分，因此与维度很好地扩展。

方法

我们在 R 包 TMB 中提出了 GLMM 的 h-lik 的高性能直接实现。使用这种方法，我们检查了重复血清钾测量与 Cerner 真实世界数据 (CRWD) EHR 数据库中生存的关系。分析此数据需要评估超过 300 万个维度的积分，这使得传统方法无法解决此问题。我们还评估了 LA 在更小的 1%和 10%全数据集样本中的可扩展性和准确性，这些样本是通过以下方法分析的：a）原始的、相互连接的广义线性模型 (iGLM) 方法，b）自适应高斯赫尔米特 (AGH) 和 c）多变量积分马尔可夫链蒙特卡罗 (MCMC) 的金标准。

结果

LA 生成的随机效应估计值与 iGLM、AGH 和 MCMC 技术获得的值相差在 10%以内。H-lik 方法比 AGH 快 4-30 倍，比 MCMC 快近 800 倍。该问题的主要临床推论是建立了钾水平与死亡率风险之间的非线性关系，以及在 CRWD 中死亡率风险的个体和医疗保健设施来源的变异估计。

结论

我们发现，通过 h-lik 方法直接实现 h-lik 为分析通过 h-lik 方法到 GLMM 的极其庞大的真实世界重复测量数据提供了一种计算效率高、数值准确的方法。我们分析的临床推论可能会指导诊所中治疗钾紊乱的治疗阈值的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f074/8310602/67bb434aa1b8/12874_2021_1318_Fig1_HTML.jpg

相似文献

High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets.

BMC Med Res Methodol. 2021 Jul 24;21(1):151. doi: 10.1186/s12874-021-01318-6.

Laplace approximation, penalized quasi-likelihood, and adaptive Gauss-Hermite quadrature for generalized linear mixed models: towards meta-analysis of binary outcome with sparse data.

BMC Med Res Methodol. 2020 Jun 11;20(1):152. doi: 10.1186/s12874-020-01035-6.

Markov chain Monte Carlo inference for Markov jump processes via the linear noise approximation.

Philos Trans A Math Phys Eng Sci. 2012 Dec 31;371(1984):20110541. doi: 10.1098/rsta.2011.0541. Print 2013 Feb 13.

Variance components analysis for pedigree-based censored survival data using generalized linear mixed models (GLMMs) and Gibbs sampling in BUGS.

Genet Epidemiol. 2000 Sep;19(2):127-48. doi: 10.1002/1098-2272(200009)19:2<127::AID-GEPI2>3.0.CO;2-S.

Flexibility of Bayesian generalized linear mixed models for oral health research.

Stat Med. 2009 Dec 10;28(28):3509-22. doi: 10.1002/sim.3648.

Federated learning algorithms for generalized mixed-effects model (GLMM) on horizontally partitioned data from distributed sources.

BMC Med Inform Decis Mak. 2022 Oct 16;22(1):269. doi: 10.1186/s12911-022-02014-1.

A comparison of computational algorithms for the Bayesian analysis of clinical trials.

Clin Trials. 2024 Dec;21(6):689-700. doi: 10.1177/17407745241247334. Epub 2024 May 16.

An MCMC method for the evaluation of the Fisher information matrix for non-linear mixed effect models.

Biostatistics. 2016 Oct;17(4):737-50. doi: 10.1093/biostatistics/kxw020. Epub 2016 May 10.

Efficient Markov chain Monte Carlo methods for decoding neural spike trains.

Neural Comput. 2011 Jan;23(1):46-96. doi: 10.1162/NECO_a_00059. Epub 2010 Oct 21.

A design-by-treatment interaction model for network meta-analysis and meta-regression with integrated nested Laplace approximations.

Res Synth Methods. 2018 Jun;9(2):179-194. doi: 10.1002/jrsm.1285. Epub 2018 Jan 16.

本文引用的文献

Scalable and accurate deep learning with electronic health records.

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

Big Health Data and Cardiovascular Diseases: A Challenge for Research, an Opportunity for Clinical Care.

Front Med (Lausanne). 2019 Feb 25;6:36. doi: 10.3389/fmed.2019.00036. eCollection 2019.

Serum potassium and clinical outcomes in heart failure patients: results of risk calculations in 21 334 patients in the UK.

ESC Heart Fail. 2019 Apr;6(2):280-290. doi: 10.1002/ehf2.12402. Epub 2019 Jan 10.

Incidence, predictors and clinical management of hyperkalaemia in new users of mineralocorticoid receptor antagonists.

Eur J Heart Fail. 2018 Aug;20(8):1217-1226. doi: 10.1002/ejhf.1199. Epub 2018 Apr 18.

Incidence and determinants of hyperkalemia and hypokalemia in a large healthcare system.

Int J Cardiol. 2017 Oct 15;245:277-284. doi: 10.1016/j.ijcard.2017.07.035. Epub 2017 Jul 15.

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.

J Am Med Inform Assoc. 2017 Jan;24(1):198-208. doi: 10.1093/jamia/ocw042. Epub 2016 May 17.

Association between Serum Potassium and Outcomes in Patients with Reduced Kidney Function.

Clin J Am Soc Nephrol. 2016 Jan 7;11(1):90-100. doi: 10.2215/CJN.01730215. Epub 2015 Oct 23.

Analysis of time to event outcomes in randomized controlled trials by generalized additive models.

PLoS One. 2015 Apr 23;10(4):e0123784. doi: 10.1371/journal.pone.0123784. eCollection 2015.

A Brief Survey of Modern Optimization for Statisticians.

Int Stat Rev. 2014 Apr 1;82(1):46-70. doi: 10.1111/insr.12022.

Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system.

Health Aff (Millwood). 2014 Jul;33(7):1163-70. doi: 10.1377/hlthaff.2014.0053.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高性能实现广义线性混合模型的层次似然：在大规模电子健康记录数据集估计钾参考范围中的应用。

High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets.

机构信息

Department of Internal Medicine, University of New Mexico School of Medicine, MSC10 5550 1 University of New Mexico Albuquerque, Albuquerque, NM, 87131, USA.

Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine MSC08 4670 1 University of New Mexico Albuquerque, Albuquerque, NM, 87131, USA.