Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo University, Ningbo, China.
Health Science Center, Ningbo University, Ningbo, China.
Front Endocrinol (Lausanne). 2024 Feb 27;15:1332982. doi: 10.3389/fendo.2024.1332982. eCollection 2024.
Cardiovascular disease (CVD) has emerged as a global public health concern. Identifying and preventing subclinical atherosclerosis (SCAS), an early indicator of CVD, is critical for improving cardiovascular outcomes. This study aimed to construct interpretable machine learning models for predicting SCAS risk in type 2 diabetes mellitus (T2DM) patients.
This study included 3084 T2DM individuals who received health care at Zhenhai Lianhua Hospital, Ningbo, China, from January 2018 to December 2022. The least absolute shrinkage and selection operator combined with random forest-recursive feature elimination were used to screen for characteristic variables. Linear discriminant analysis, logistic regression, Naive Bayes, random forest, support vector machine, and extreme gradient boosting were employed in constructing risk prediction models for SCAS in T2DM patients. The area under the receiver operating characteristic curve (AUC) was employed to assess the predictive capacity of the model through 10-fold cross-validation. Additionally, the SHapley Additive exPlanations were utilized to interpret the best-performing model.
The percentage of SCAS was 38.46% (n=1186) in the study population. Fourteen variables, including age, white blood cell count, and basophil count, were identified as independent risk factors for SCAS. Nine predictors, including age, albumin, and total protein, were screened for the construction of risk prediction models. After validation, the random forest model exhibited the best clinical predictive value in the training set with an AUC of 0.729 (95% CI: 0.709-0.749), and it also demonstrated good predictive value in the internal validation set [AUC: 0.715 (95% CI: 0.688-0.742)]. The model interpretation revealed that age, albumin, total protein, total cholesterol, and serum creatinine were the top five variables contributing to the prediction model.
The construction of SCAS risk models based on the Chinese T2DM population contributes to its early prevention and intervention, which would reduce the incidence of adverse cardiovascular prognostic events.
心血管疾病(CVD)已成为全球公共卫生关注的焦点。识别和预防亚临床动脉粥样硬化(SCAS)是改善心血管结局的关键,SCAS 是 CVD 的早期指标。本研究旨在构建可解释的机器学习模型,以预测 2 型糖尿病(T2DM)患者的 SCAS 风险。
本研究纳入了 2018 年 1 月至 2022 年 12 月在宁波镇海龙莲医院接受医疗保健的 3084 例 T2DM 患者。使用最小绝对收缩和选择算子(LASSO)结合随机森林递归特征消除来筛选特征变量。采用线性判别分析、逻辑回归、朴素贝叶斯、随机森林、支持向量机和极端梯度提升来构建 T2DM 患者 SCAS 风险预测模型。通过 10 折交叉验证,使用受试者工作特征曲线下面积(AUC)评估模型的预测能力。此外,还利用 SHapley Additive exPlanations 来解释表现最佳的模型。
研究人群中 SCAS 的比例为 38.46%(n=1186)。年龄、白细胞计数和嗜碱性粒细胞计数等 14 个变量被确定为 SCAS 的独立危险因素。筛选出年龄、白蛋白和总蛋白等 9 个预测因子用于构建风险预测模型。经过验证,随机森林模型在训练集中具有最佳的临床预测价值,AUC 为 0.729(95%CI:0.709-0.749),在内部验证集中也具有良好的预测价值[AUC:0.715(95%CI:0.688-0.742)]。模型解释表明,年龄、白蛋白、总蛋白、总胆固醇和血清肌酐是预测模型的前五个最重要的变量。
基于中国 T2DM 人群构建的 SCAS 风险模型有助于其早期预防和干预,从而降低不良心血管预后事件的发生率。