Xu Hu, Cao Wen-Zhe, Bai Yong-Yi, Dong Jing, Che He-Bin, Bai Po, Wang Jian-Dong, Cao Feng, Fan Li
Chinese PLA Medical School, Chinese PLA General Hospital, Beijing, China.
Department of Cardiology, the Second Medical Center, National Clinical Research Center for Geriatric Diseases, Chinese PLA General Hospital, Beijing, China.
J Geriatr Cardiol. 2022 Jun 28;19(6):445-455. doi: 10.11909/j.issn.1671-5411.2022.06.006.
To establish a prediction model of coronary heart disease (CHD) in elderly patients with diabetes mellitus (DM) based on machine learning (ML) algorithms.
Based on the Medical Big Data Research Centre of Chinese PLA General Hospital in Beijing, China, we identified a cohort of elderly inpatients (≥ 60 years), including 10,533 patients with DM complicated with CHD and 12,634 patients with DM without CHD, from January 2008 to December 2017. We collected demographic characteristics and clinical data. After selecting the important features, we established five ML models, including extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), adaptive boosting (Adaboost) and logistic regression (LR). We compared the receiver operating characteristic curves, area under the curve (AUC) and other relevant parameters of different models and determined the optimal classification model. The model was then applied to 7447 elderly patients with DM admitted from January 2018 to December 2019 to further validate the performance of the model.
Fifteen features were selected and included in the ML model. The classification precision in the test set of the XGBoost, RF, DT, Adaboost and LR models was 0.778, 0.789, 0.753, 0.750 and 0.689, respectively; and the AUCs of the subjects were 0.851, 0.845, 0.823, 0.833 and 0.731, respectively. Applying the XGBoost model with optimal performance to a newly recruited dataset for validation, the diagnostic sensitivity, specificity, precision, and AUC were 0.792, 0.808, 0.748 and 0.880, respectively.
The XGBoost model established in the present study had certain predictive value for elderly patients with DM complicated with CHD.
基于机器学习(ML)算法建立老年糖尿病(DM)患者冠心病(CHD)的预测模型。
基于中国人民解放军总医院北京医学大数据研究中心,我们确定了一组老年住院患者(≥60岁),包括2008年1月至2017年12月期间10533例患有DM并合并CHD的患者和12634例患有DM但无CHD的患者。我们收集了人口统计学特征和临床数据。在选择重要特征后,我们建立了五个ML模型,包括极端梯度提升(XGBoost)、随机森林(RF)、决策树(DT)、自适应提升(Adaboost)和逻辑回归(LR)。我们比较了不同模型的受试者工作特征曲线、曲线下面积(AUC)及其他相关参数,并确定了最佳分类模型。然后将该模型应用于2018年1月至2019年12月收治的7447例老年DM患者,以进一步验证该模型的性能。
选择了15个特征并纳入ML模型。XGBoost、RF、DT、Adaboost和LR模型在测试集中的分类精度分别为0.778、0.789、0.753、0.750和0.689;受试者的AUC分别为0.851、0.845、0.823、0.833和0.731。将性能最佳的XGBoost模型应用于新招募的数据集进行验证,诊断敏感性、特异性、精度和AUC分别为0.792、0.808、0.748和0.880。
本研究建立的XGBoost模型对老年DM合并CHD患者具有一定的预测价值。