开发和验证用于识别正常血糖中的糖尿病前期和糖尿病的机器学习模型。

Development and Validation of Machine Learning Models for Identifying Prediabetes and Diabetes in Normoglycemia.

机构信息

Postgraduate Department, Shandong First Medical University (Shandong Academy of Medical Sciences), Jinan, China.

Department of Anesthesiology, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China.

出版信息

Diabetes Metab Res Rev. 2024 Nov;40(8):e70003. doi: 10.1002/dmrr.70003.

DOI:10.1002/dmrr.70003

PMID:39497474

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11601146/

Abstract

BACKGROUND

Prediabetes and diabetes are both abnormal states of glucose metabolism (AGM) that can lead to severe complications. Early detection of AGM is crucial for timely intervention and treatment. However, fasting blood glucose (FBG) as a mass population screening method may fail to identify some individuals who are actually AGM but with normoglycemia. This study aimed to develop and validate machine learning (ML) models to identify AGM among individuals with normoglycemia using routine health check-up indicators.

METHODS

According to the American Diabetes Association (ADA) criteria, participants with normoglycemia (FBG ≤ 5.6 mmol/L) were collected from 2019 to 2023, and then divided into AGM and Normal groups using glycosylated haemoglobin (HbA1c) 5.7% as the threshold. Data from 2019 to 2022 were divided into training and internal validation sets at a 7:3 ratio, while data from 2023 were used as the external validation set. Seven ML algorithms-including logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting machine, multilayer perceptron (MLP), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost)-were used to build models for identifying AGM in normoglycemia population. Model performance was evaluated using the area under the receiver operating characteristic curve (auROC) and the precision-recall curve (auPR). The feature contributions to the optimal model was visualised using the SHapley Additive exPlanations (SHAP). Finally, an intuitive and user-friendly interactive interface was developed.

RESULTS

A total of 59,259 participants were finally enroled in this study, and then divided into the training set of 32,810, the internal validation set of 14,060, and the external validation set of 12,389. The Catboost model outperformed the others with auROC of 0.806 and 0.794 for the internal and external validation set, respectively. Age was the most important feature influencing the performance of the CatBoost model, followed by fasting blood glucose, red blood cells, haemoglobin, body mass index, and triglyceride-glucose.

CONCLUSION

A well-performed ML model to identify AGM in the normoglycemia population was built, offering significant potential for early intervention and treatment of AGM that would otherwise have been missed.

摘要

背景

糖尿病前期和糖尿病都是葡萄糖代谢异常（AGM）的异常状态，可导致严重的并发症。早期发现 AGM 对于及时干预和治疗至关重要。然而，空腹血糖（FBG）作为一种大规模人群筛查方法，可能无法识别出一些实际上处于 AGM 但血糖正常的个体。本研究旨在开发和验证机器学习（ML）模型，以使用常规健康检查指标识别血糖正常人群中的 AGM。

方法

根据美国糖尿病协会（ADA）标准，收集 2019 年至 2023 年血糖正常（FBG≤5.6mmol/L）的参与者，并使用糖化血红蛋白（HbA1c）5.7%作为阈值将其分为 AGM 和正常组。2019 年至 2022 年的数据按 7:3 的比例分为训练集和内部验证集，而 2023 年的数据用作外部验证集。使用七种 ML 算法——逻辑回归（LR）、随机森林（RF）、支持向量机（SVM）、极端梯度提升机、多层感知机（MLP）、轻梯度提升机（LightGBM）和分类提升机（CatBoost）——构建用于识别血糖正常人群中 AGM 的模型。使用接受者操作特征曲线下的面积（auROC）和精度-召回曲线下的面积（auPR）评估模型性能。使用 SHapley Additive exPlanations（SHAP）可视化最优模型的特征贡献。最后，开发了一个直观且用户友好的交互界面。

结果

本研究共纳入 59259 名参与者，分为训练集 32810 名、内部验证集 14060 名和外部验证集 12389 名。CatBoost 模型的内部验证集和外部验证集的 auROC 分别为 0.806 和 0.794，优于其他模型。年龄是影响 CatBoost 模型性能的最重要特征，其次是空腹血糖、红细胞、血红蛋白、体重指数和甘油三酯-葡萄糖。