Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, 6000, Koper, Slovenia.
Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia.
Sci Rep. 2020 Jul 20;10(1):11981. doi: 10.1038/s41598-020-68771-z.
Most screening tests for T2DM in use today were developed using multivariate regression methods that are often further simplified to allow transformation into a scoring formula. The increasing volume of electronically collected data opened the opportunity to develop more complex, accurate prediction models that can be continuously updated using machine learning approaches. This study compares machine learning-based prediction models (i.e. Glmnet, RF, XGBoost, LightGBM) to commonly used regression models for prediction of undiagnosed T2DM. The performance in prediction of fasting plasma glucose level was measured using 100 bootstrap iterations in different subsets of data simulating new incoming data in 6-month batches. With 6 months of data available, simple regression model performed with the lowest average RMSE of 0.838, followed by RF (0.842), LightGBM (0.846), Glmnet (0.859) and XGBoost (0.881). When more data were added, Glmnet improved with the highest rate (+ 3.4%). The highest level of variable selection stability over time was observed with LightGBM models. Our results show no clinically relevant improvement when more sophisticated prediction models were used. Since higher stability of selected variables over time contributes to simpler interpretation of the models, interpretability and model calibration should also be considered in development of clinical prediction models.
目前用于 T2DM 筛查的大多数检测方法都是基于多元回归方法开发的,这些方法通常进一步简化为评分公式。随着电子采集数据量的增加,为开发更复杂、更准确的预测模型提供了机会,这些模型可以使用机器学习方法不断更新。本研究将基于机器学习的预测模型(即 Glmnet、RF、XGBoost、LightGBM)与常用的回归模型进行比较,以预测未确诊的 T2DM。通过在不同数据子集中进行 100 次 bootstrap 迭代,测量预测空腹血糖水平的性能,模拟 6 个月批次中传入的新数据。在有 6 个月的数据可用的情况下,简单回归模型的平均 RMSE 最低,为 0.838,其次是 RF(0.842)、LightGBM(0.846)、Glmnet(0.859)和 XGBoost(0.881)。当添加更多数据时,Glmnet 以最高速度(+3.4%)得到改善。LightGBM 模型的变量选择稳定性随着时间的推移而提高。当使用更复杂的预测模型时,我们的结果没有显示出临床上的显著改善。由于所选变量的稳定性随着时间的推移而提高,这有助于模型的更简单解释,因此在开发临床预测模型时,还应考虑可解释性和模型校准。
Comput Methods Programs Biomed. 2023-6
Healthcare (Basel). 2025-8-15
Comput Struct Biotechnol J. 2025-6-23
PLoS One. 2025-6-25
J Diabetes Metab Disord. 2025-6-17
Food Sci Nutr. 2025-4-30
Behav Sci (Basel). 2025-4-2
Commun Med (Lond). 2025-4-22
Int J Med Inform. 2017-9-25
Comput Struct Biotechnol J. 2017-1-8
Diabetes Res Clin Pract. 2016-11
Sci Rep. 2016-9-14