在中国汉族男性中使用机器学习在不进行口服葡萄糖耐量试验的情况下预测单纯性糖耐量受损

Predicting isolated impaired glucose tolerance without oral glucose tolerance test using machine learning in Chinese Han men.

作者信息

Wang Lin, Xie Jing, Gu Zhaoyan, Miao Xinyu, Ma Lichao, Yan Shuangtong, Gong Yanping, Li Chunlin, Sun Banruo, Ruan Yue

机构信息

Department of Endocrinology, Second Medical Center, Chinese People's Liberation Army General Hospital, National Clinical Research Center for Geriatric Diseases, Beijing, China.

Department of Special Medical Service, Ninth Medical Center, Chinese People's Liberation Army General Hospital, Beijing, China.

出版信息

Front Endocrinol (Lausanne). 2025 Apr 24;16:1514397. doi: 10.3389/fendo.2025.1514397. eCollection 2025.

DOI:10.3389/fendo.2025.1514397

PMID:40343071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12058868/

Abstract

BACKGROUND

Isolated Impaired Glucose Tolerance (I-IGT) represents a specific prediabetic state that typically requires a standardized oral glucose tolerance test (OGTT) for diagnosis. This study aims to predict glucose tolerance status in Chinese Han men at fasting state using machine learning (ML) models with demographic, anthropometric, and laboratory data.

METHODS

The study population consisted of 1,117 Chinese Han men aged 50-87 years. Baseline variables including age, fasting plasma glucose (FPG), high blood pressure (HBP), body mass index (BMI), waist to hip ratio (WHR), total cholesterol (TC), triglyceride (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C) were collected from electronic medical records (EMRs) for machine learning model training and validation. Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), Adaptive Boosting (AdaBoost) and Gradient Boosting Machines (GBM) were tested for machine learning model performance comparison. Model performance was evaluated using metrics including accuracy, recall, F1 score, positive predictive value (PPV), negative predictive value (NPV), and the area under the receiver operating characteristic curve (AUC). Shapley Additive Explanations (SHAP) and confusion matrix plots were used for model interpretation.

RESULTS

The RF model demonstrated the best overall performance with a 96.7% accuracy, recall of 91.4%, F1 score of 95.7%, PPV of 99.1%, and NPV of 95.6%. The AUC values for the SVM, DT, RF, LR, KNN, NB, AdaBoost, and GBM models were 0.97, 0.92, 0.96, 0.97, 0.88, 0.88, 0.97, and 0.97, respectively. While the RF model showed strong overall performance, the LR model had the highest AUC, indicating superior discriminatory power. FPG was identified as the most important predictor for I-IGT, followed by HDL, TC, HBP, BMI, and WHR. Individuals with FPG levels higher than 5.1 mmol/L were more likely to have I-IGT; the performance metrics for this cut-off value were: 89.35% accuracy, 89.79% recall, 85.22% F1 score, 81.09% PPV, 94.38% NPV, and 0.95 AUC.

CONCLUSION

Machine learning models based on demographic and clinical characteristics offer a cost-effective method for predicting I-IGT in Chinese Han men aged over 50, without the need for an OGTT. These models could complement existing early diagnostic strategies, thereby enhancing the early detection and prevention of diabetes. Additionally, FPG alone could serve as an efficient screening tool for the early identification of I-IGT in clinical settings.

摘要

背景

孤立性糖耐量受损（I-IGT）代表一种特定的糖尿病前期状态，通常需要标准化口服葡萄糖耐量试验（OGTT）来进行诊断。本研究旨在使用包含人口统计学、人体测量学和实验室数据的机器学习（ML）模型，预测中国汉族男性空腹状态下的糖耐量状况。

方法

研究人群包括1117名年龄在50 - 87岁的中国汉族男性。从电子病历（EMR）中收集基线变量，包括年龄、空腹血糖（FPG）、高血压（HBP）、体重指数（BMI）、腰臀比（WHR）、总胆固醇（TC）、甘油三酯（TG）、高密度脂蛋白胆固醇（HDL-C）和低密度脂蛋白胆固醇（LDL-C），用于机器学习模型的训练和验证。对支持向量机（SVM）、决策树（DT）、随机森林（RF）、逻辑回归（LR）、K近邻（KNN）、朴素贝叶斯（NB）、自适应提升（AdaBoost）和梯度提升机（GBM）进行测试，以比较机器学习模型的性能。使用包括准确率、召回率、F1分数、阳性预测值（PPV）、阴性预测值（NPV）以及受试者工作特征曲线下面积（AUC）等指标来评估模型性能。使用夏普利加性解释（SHAP）和混淆矩阵图进行模型解释。

结果

RF模型展现出最佳的总体性能，准确率为96.7%，召回率为91.4%，F1分数为95.7%，PPV为99.1%，NPV为95.6%。SVM、DT、RF、LR、KNN、NB、AdaBoost和GBM模型的AUC值分别为0.97、0.92、0.96、0.97、0.88、0.88、0.97和0.97。虽然RF模型总体性能强劲，但LR模型的AUC最高，表明其具有卓越的判别能力。FPG被确定为I-IGT最重要的预测指标，其次是HDL、TC、HBP、BMI和WHR。FPG水平高于5.1 mmol/L的个体更有可能患有I-IGT；该临界值的性能指标为：准确率89.35%，召回率89.79%，F1分数85.22%，PPV 81.09%，NPV 94.38%，AUC 0.95。