Department of Geriatrics, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.
Physical Examination Center, The Affiliated Suzhou Hospital of Nanjing University Medical School, Suzhou, Jiangsu, China.
Front Endocrinol (Lausanne). 2024 Sep 24;15:1368225. doi: 10.3389/fendo.2024.1368225. eCollection 2024.
The aim of this study was to develop and validate a machine learning-based model to predict the development of impaired fasting glucose (IFG) in middle-aged and older elderly people over a 5-year period using data from a cohort study.
This study was a retrospective cohort study. The study population was 1855 participants who underwent consecutive physical examinations at the First Affiliated Hospital of Soochow University between 2018 and 2022.The dataset included medical history, physical examination, and biochemical index test results. The cohort was randomly divided into a training dataset and a validation dataset in a ratio of 8:2. The machine learning algorithms used in this study include Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), Naive Bayes, Decision Trees (DT), and traditional Logistic Regression (LR). Feature selection, parameter optimization, and model construction were performed in the training set, while the validation set was used to evaluate the predictive performance of the models. The performance of these models is evaluated by an area under the receiver operating characteristic (ROC) curves (AUC), calibration curves and decision curve analysis (DCA). To interpret the best-performing model, the Shapley Additive exPlanation (SHAP) Plots was used in this study.
The training/validation dataset consists of 1,855 individuals from the First Affiliated Hospital of Soochow University, yielded significant variables following selection by the Boruta algorithm and logistic multivariate regression analysis. These significant variables included systolic blood pressure (SBP), fatty liver, waist circumference (WC) and serum creatinine (Scr). The XGBoost model outperformed the other models, demonstrating an AUC of 0.7391 in the validation set.
The XGBoost model was composed of SBP, fatty liver, WC and Scr may assist doctors with the early identification of IFG in middle-aged and elderly people.
本研究旨在利用队列研究数据,开发并验证一种基于机器学习的模型,以预测中年和老年人在 5 年内发生空腹血糖受损(IFG)的情况。
这是一项回顾性队列研究。研究人群为 2018 年至 2022 年期间在苏州大学第一附属医院连续进行体格检查的 1855 名参与者。数据集包括病史、体格检查和生化指标检测结果。该队列按照 8:2 的比例随机分为训练数据集和验证数据集。本研究中使用的机器学习算法包括极端梯度提升(XGBoost)、支持向量机(SVM)、朴素贝叶斯、决策树(DT)和传统的逻辑回归(LR)。在训练集中进行特征选择、参数优化和模型构建,而验证集则用于评估模型的预测性能。这些模型的性能通过接受者操作特征(ROC)曲线下的面积(AUC)、校准曲线和决策曲线分析(DCA)进行评估。为了解释表现最佳的模型,本研究中使用了 Shapley Additive exPlanation(SHAP)Plots。
来自苏州大学第一附属医院的 1855 名个体组成了训练/验证数据集,经过 Boruta 算法和逻辑多元回归分析的选择,得出了显著变量。这些显著变量包括收缩压(SBP)、脂肪肝、腰围(WC)和血清肌酐(Scr)。XGBoost 模型表现优于其他模型,在验证集中的 AUC 为 0.7391。
XGBoost 模型由 SBP、脂肪肝、WC 和 Scr 组成,可能有助于医生早期识别中年和老年人的 IFG。