Shanghai AI Laboratory, Shanghai, China.
Department of Endocrinology and Metabolism, Peking University People's Hospital, Beijing, China.
J Diabetes Investig. 2023 Nov;14(11):1289-1302. doi: 10.1111/jdi.14069. Epub 2023 Aug 22.
AIMS/INTRODUCTION: Clinical guidelines for the management of individuals with type 2 diabetes mellitus endorse the systematic assessment of atherosclerotic cardiovascular disease risk for early interventions. In this study, we aimed to develop machine learning models to predict 3-year atherosclerotic cardiovascular disease risk in Chinese type 2 diabetes mellitus patients.
Clinical records of 4,722 individuals with type 2 diabetes mellitus admitted to 94 hospitals were used. The features included demographic information, disease histories, laboratory tests and physical examinations. Logistic regression, support vector machine, gradient boosting decision tree, random forest and adaptive boosting were applied for model construction. The performance of these models was evaluated using the area under the receiver operating characteristic curve. Additionally, we applied SHapley Additive exPlanation values to explain the prediction model.
All five models achieved good performance in both internal and external test sets (area under the receiver operating characteristic curve >0.8). Random forest showed the highest discrimination ability, with sensitivity and specificity being 0.838 and 0.814, respectively. The SHapley Additive exPlanation analyses showed that previous history of diabetic peripheral vascular disease, older populations and longer diabetes duration were the three most influential predictors.
The prediction models offer opportunities to personalize treatment and maximize the benefits of these medical interventions.
目的/引言:针对 2 型糖尿病患者管理的临床指南支持系统性评估动脉粥样硬化性心血管疾病风险,以便进行早期干预。本研究旨在开发机器学习模型,以预测中国 2 型糖尿病患者 3 年内发生动脉粥样硬化性心血管疾病的风险。
使用了 94 家医院收治的 4722 例 2 型糖尿病患者的临床记录。特征包括人口统计学信息、疾病史、实验室检查和体格检查。应用逻辑回归、支持向量机、梯度提升决策树、随机森林和自适应提升来构建模型。使用接受者操作特征曲线下的面积来评估这些模型的性能。此外,我们应用 Shapley 加法解释值来解释预测模型。
所有 5 种模型在内部和外部测试集上均表现出良好的性能(接受者操作特征曲线下的面积>0.8)。随机森林显示出最高的区分能力,其敏感性和特异性分别为 0.838 和 0.814。Shapley 加法解释分析表明,糖尿病外周血管疾病史、年龄较大和糖尿病病程较长是三个最具影响力的预测因素。
这些预测模型为个性化治疗提供了机会,并使这些医疗干预措施的效益最大化。