Wang Ke, Tian Jing, Zheng Chu, Yang Hong, Ren Jia, Liu Yanling, Han Qinghua, Zhang Yanbo
Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China.
Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People's Republic of China.
Comput Biol Med. 2021 Oct;137:104813. doi: 10.1016/j.compbiomed.2021.104813. Epub 2021 Aug 28.
This study sought to evaluate the performance of machine learning (ML) models and establish an explainable ML model with good prediction of 3-year all-cause mortality in patients with heart failure (HF) caused by coronary heart disease (CHD).
We established six ML models using follow-up data to predict 3-year all-cause mortality. Through comprehensive evaluation, the best performing model was used to predict and stratify patients. The log-rank test was used to assess the difference between Kaplan-Meier curves. The association between ML risk and 3-year all-cause mortality was also assessed using multivariable Cox regression. Finally, an explainable approach based on ML and the SHapley Additive exPlanations (SHAP) method was deployed to calculate 3-year all-cause mortality risk and to generate individual explanations of the model's decisions.
The best performing extreme gradient boosting (XGBoost) model was selected to predict and stratify patients. Subjects with a higher ML score had a high hazard of suffering events (hazard ratio [HR]: 10.351; P < 0.001), and this relationship persisted with a multivariable analysis (adjusted HR: 5.343; P < 0.001). Age, N-terminal pro-B-type natriuretic peptide, occupation, New York Heart Association classification, and nitrate drug use were important factors for both genders.
The ML-based risk stratification tool was able to accurately assess and stratify the risk of 3-year all-cause mortality in patients with HF caused by CHD. ML combined with SHAP could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of key features in the model.
本研究旨在评估机器学习(ML)模型的性能,并建立一个可解释的ML模型,以良好预测冠心病(CHD)所致心力衰竭(HF)患者的3年全因死亡率。
我们使用随访数据建立了六个ML模型,以预测3年全因死亡率。通过综合评估,选用性能最佳的模型对患者进行预测和分层。采用对数秩检验评估Kaplan-Meier曲线之间的差异。还使用多变量Cox回归评估ML风险与3年全因死亡率之间的关联。最后,采用基于ML和SHapley加性解释(SHAP)方法的可解释方法来计算3年全因死亡率风险,并对模型决策生成个体解释。
选择性能最佳的极端梯度提升(XGBoost)模型对患者进行预测和分层。ML评分较高的受试者发生事件的风险较高(风险比[HR]:10.351;P < 0.001),多变量分析时这种关系依然存在(调整后HR:5.343;P < 0.001)。年龄、N末端B型利钠肽原、职业、纽约心脏协会分级和硝酸盐药物使用是两性的重要因素。
基于ML的风险分层工具能够准确评估和分层CHD所致HF患者3年全因死亡风险。ML与SHAP相结合可以对个体风险预测提供明确解释,并让医生直观了解模型中关键特征的影响。