Suppr超能文献

混合效应机器学习:预测血红蛋白 A1c 纵向变化的框架。

Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c.

机构信息

Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.

Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.

出版信息

J Biomed Inform. 2019 Jan;89:56-67. doi: 10.1016/j.jbi.2018.09.001. Epub 2018 Sep 4.

Abstract

Accurate and reliable prediction of clinical progression over time has the potential to improve the outcomes of chronic disease. The classical approach to analyzing longitudinal data is to use (generalized) linear mixed-effect models (GLMM). However, linear parametric models are predicated on assumptions, which are often difficult to verify. In contrast, data-driven machine learning methods can be applied to derive insight from the raw data without a priori assumptions. However, the underlying theory of most machine learning algorithms assume that the data is independent and identically distributed, making them inefficient for longitudinal supervised learning. In this study, we formulate an analytic framework, which integrates the random-effects structure of GLMM into non-linear machine learning models capable of exploiting temporal heterogeneous effects, sparse and varying-length patient characteristics inherent in longitudinal data. We applied the derived mixed-effect machine learning (MEml) framework to predict longitudinal change in glycemic control measured by hemoglobin A1c (HbA1c) among well controlled adults with type 2 diabetes. Results show that MEml is competitive with traditional GLMM, but substantially outperformed standard machine learning models that do not account for random-effects. Specifically, the accuracy of MEml in predicting glycemic change at the 1st, 2nd, 3rd, and 4th clinical visits in advanced was 1.04, 1.08, 1.11, and 1.14 times that of the gradient boosted model respectively, with similar results for the other methods. To further demonstrate the general applicability of MEml, a series of experiments were performed using real publicly available and synthetic data sets for accuracy and robustness. These experiments reinforced the superiority of MEml over the other methods. Overall, results from this study highlight the importance of modeling random-effects in machine learning approaches based on longitudinal data. Our MEml method is highly resistant to correlated data, readily accounts for random-effects, and predicts change of a longitudinal clinical outcome in real-world clinical settings with high accuracy.

摘要

准确可靠地预测随时间推移的临床进展有可能改善慢性病的结局。分析纵向数据的经典方法是使用(广义)线性混合效应模型(GLMM)。然而,线性参数模型基于假设,这些假设通常难以验证。相比之下,数据驱动的机器学习方法可以在没有先验假设的情况下从原始数据中得出见解。然而,大多数机器学习算法的基本理论假设数据是独立同分布的,这使得它们在纵向监督学习中效率低下。在这项研究中,我们提出了一个分析框架,该框架将 GLMM 的随机效应结构集成到能够利用纵向数据中固有的时间异质效应、稀疏和变化长度患者特征的非线性机器学习模型中。我们将导出的混合效应机器学习(MEml)框架应用于预测糖化血红蛋白(HbA1c)测量的 2 型糖尿病控制良好的成年人的纵向血糖控制变化。结果表明,MEml 与传统 GLMM 具有竞争力,但明显优于不考虑随机效应的标准机器学习模型。具体来说,MEml 在预测高级别第 1、2、3 和 4 次临床就诊时的血糖变化的准确性分别是梯度提升模型的 1.04、1.08、1.11 和 1.14 倍,其他方法也有类似的结果。为了进一步证明 MEml 的普遍适用性,我们使用真实的公开可用数据和合成数据集进行了一系列准确性和稳健性实验。这些实验强化了 MEml 优于其他方法的优越性。总的来说,这项研究的结果强调了在基于纵向数据的机器学习方法中建模随机效应的重要性。我们的 MEml 方法对相关数据具有高度抗性,易于考虑随机效应,并以高精度预测真实临床环境中纵向临床结果的变化。

相似文献

引用本文的文献

9
Hybrid statistical and machine learning modeling of cognitive neuroscience data.认知神经科学数据的混合统计与机器学习建模
J Appl Stat. 2023 Feb 16;51(6):1076-1097. doi: 10.1080/02664763.2023.2176834. eCollection 2024.

本文引用的文献

8
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验