Liu Hongzhou, Dong Song, Yang Hua, Wang Linlin, Liu Jia, Du Yangfan, Liu Jing, Lyu Zhaohui, Wang Yuhan, Jiang Li, Yu Shasha, Fu Xiaomin
Department of Endocrinology, Aerospace Center Hospital, Beijing, China.
Department of Endocrinology, First Hospital of Handan City, Handan, China.
J Int Med Res. 2024 Jun;52(6):3000605241253786. doi: 10.1177/03000605241253786.
To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records.
We included 46,247 patients (32,372 and 13,875 in training and validation sets, respectively) from a national health checkup center database. Univariate and multivariate Cox analyses were performed to identify factors influencing T2DM risk. Extreme Gradient Boosting (XGBoost), support vector machine (SVM), logistic regression (LR), and random forest (RF) models were trained to predict 5-year T2DM risk. Model performances were analyzed using receiver operating characteristic (ROC) curves for discrimination and calibration plots for prediction accuracy.
Key variables included fasting plasma glucose, age, and sedentary time. The LR model showed good accuracy with respective areas under the ROC (AUCs) of 0.914 and 0.913 in training and validation sets; the RF model exhibited favorable AUCs of 0.998 and 0.838. In calibration analysis, the LR model displayed good fit for low-risk patients; the RF model exhibited satisfactory fit for low- and high-risk patients.
LR and RF models can effectively predict T2DM risk in the Chinese population. These models may help identify high-risk patients and guide interventions to prevent complications and disabilities.
通过回顾性分析年度健康体检记录,评估机器学习(ML)模型在中国人群中预测5年2型糖尿病(T2DM)风险的有效性。
我们纳入了来自国家健康体检中心数据库的46247例患者(训练集和验证集分别为32372例和13875例)。进行单因素和多因素Cox分析以确定影响T2DM风险的因素。训练极端梯度提升(XGBoost)、支持向量机(SVM)、逻辑回归(LR)和随机森林(RF)模型以预测5年T2DM风险。使用受试者工作特征(ROC)曲线分析模型性能以进行鉴别,并使用校准图分析预测准确性。
关键变量包括空腹血糖、年龄和久坐时间。LR模型显示出良好的准确性,训练集和验证集的ROC曲线下面积(AUC)分别为0.914和0.913;RF模型的AUC分别为0.998和0.838。在校准分析中,LR模型对低风险患者显示出良好的拟合度;RF模型对低风险和高风险患者均显示出令人满意的拟合度。
LR和RF模型可以有效预测中国人群的T2DM风险。这些模型可能有助于识别高危患者,并指导采取干预措施以预防并发症和残疾。