Zan Jiaxin, Dong Xiaojing, Yang Hong, Yan Jingjing, He Zixuan, Tian Jing, Zhang Yanbo
Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China.
Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People's Republic of China.
Risk Manag Healthc Policy. 2024 Aug 6;17:1921-1936. doi: 10.2147/RMHP.S472398. eCollection 2024.
This study sought to develop an unbalanced-ensemble model that could accurately predict death outcomes of patients with comorbid coronary heart disease (CHD) and hypertension and evaluate the factors contributing to death.
Medical records of 1058 patients with coronary heart disease combined with hypertension and excluding those acute coronary syndrome were collected. Patients were followed-up at the first, third, sixth, and twelfth months after discharge to record death events. Follow-up ended two years after discharge. Patients were divided into survival and nonsurvival groups. According to medical records, gender, smoking, drinking, COPD, cerebral stroke, diabetes, hyperhomocysteinemia, heart failure and renal insufficiency of the two groups were sorted and compared and other influencing factors of the two groups, feature selection was carried out to construct models. Owing to data unbalance, we developed four unbalanced-ensemble prediction models based on Balanced Random Forest (BRF), EasyEnsemble, RUSBoost, SMOTEBoost and the two base classification algorithms based on AdaBoost and Logistic. Each model was optimised using hyperparameters based on GridSearchCV and evaluated using area under the curve (AUC), sensitivity, recall, Brier score, and geometric mean (G-mean). Additionally, to understand the influence of variables on model performance, we constructed a SHapley Additive explanation (SHAP) model based on the optimal model.
There were significant differences in age, heart rate, COPD, cerebral stroke, heart failure and renal insufficiency in the nonsurvival group compared with the survival group. Among all models, BRF yielded the highest AUC (0.810; 95% CI, 0.778-0.839), sensitivity (0.990; 95% CI, 0.981-1.000), recall (0.990; 95% CI, 0.981-1.000), and G-mean (0.806; 95% CI, 0.778-0.827), and the lowest Brier score (0.181; 95% CI, 0.178-0.185). Therefore, we identified BRF as the optimal model. Furthermore, red blood cell count (RBC), body mass index (BMI), and lactate dehydrogenase were found to be important mortality-associated risk factors.
BRF combined with advanced machine learning methods and SHAP is highly effective and accurately predicts mortality in patients with CHD comorbid with hypertension. This model has the potential to assist clinicians in modifying treatment strategies to improve patient outcomes.
本研究旨在开发一种不平衡集成模型,该模型能够准确预测合并冠心病(CHD)和高血压患者的死亡结局,并评估导致死亡的因素。
收集1058例冠心病合并高血压且排除急性冠脉综合征患者的病历。患者在出院后的第1、3、6和12个月进行随访,记录死亡事件。随访在出院后两年结束。将患者分为生存组和非生存组。根据病历,对两组患者的性别、吸烟、饮酒、慢性阻塞性肺疾病(COPD)、脑卒中、糖尿病、高同型半胱氨酸血症、心力衰竭和肾功能不全进行整理和比较,并对两组的其他影响因素进行特征选择以构建模型。由于数据不平衡,我们基于平衡随机森林(BRF)、易集成、RUSBoost、SMOTEBoost开发了四个不平衡集成预测模型,以及基于AdaBoost和逻辑回归的两个基本分类算法。每个模型使用基于网格搜索交叉验证(GridSearchCV)的超参数进行优化,并使用曲线下面积(AUC)、敏感性、召回率、布里尔评分和几何均值(G-均值)进行评估。此外,为了解变量对模型性能的影响,我们基于最优模型构建了夏普利加法解释(SHAP)模型。
与生存组相比,非生存组在年龄、心率、COPD、脑卒中、心力衰竭和肾功能不全方面存在显著差异。在所有模型中,BRF的AUC最高(0.810;95%置信区间,0.778 - 0.839)、敏感性最高(0.990;95%置信区间,0.981 - 1.000)、召回率最高(0.990;95%置信区间,0.981 - 1.000)、G-均值最高(0.806;95%置信区间,0.778 - 0.827),而布里尔评分最低(0.181;95%置信区间,0.178 - 0.185)。因此,我们将BRF确定为最优模型。此外,发现红细胞计数(RBC)、体重指数(BMI)和乳酸脱氢酶是重要的与死亡相关的危险因素。
BRF结合先进的机器学习方法和SHAP非常有效,能够准确预测合并CHD和高血压患者的死亡率。该模型有可能协助临床医生调整治疗策略以改善患者预后。