Wang Ke, Tian Jing, Zheng Chu, Yang Hong, Ren Jia, Li Chenhao, Han Qinghua, Zhang Yanbo
Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China.
Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People's Republic of China.
Risk Manag Healthc Policy. 2021 Jun 8;14:2453-2463. doi: 10.2147/RMHP.S310295. eCollection 2021.
This study sought to develop models with good identification for adverse outcomes in patients with heart failure (HF) and find strong factors that affect prognosis.
A total of 5004 qualifying cases were selected, among which 498 cases had adverse outcomes and 4506 cases were discharged after improvement. The study subjects were hospitalized patients diagnosed with HF from a regional cardiovascular hospital and the cardiology department of a medical university hospital in Shanxi Province of China between January 2014 and June 2019. Synthesizing minority oversampling technology combined with edited nearest neighbors (SMOTE+ENN) was used to pre-process unbalanced data. Traditional logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were used to build risk identification models, and each model was repeated 100 times. Model discrimination and calibration were estimated using F1-score, the area under the receiver-operating characteristic curve (AUROC), and Brier score. The best performing of the five models was used to identify the risk of adverse outcomes and evaluate the influencing factors.
The SME-XGBoost was the best performing model with means of F1-score (0.3673, 95% confidence interval [CI]: 0.3633-0.3712), AUC (0.8010, CI: 0.7974-0.8046), and Brier score (0.1769, CI: 0.1748-0.1789). Age, N-terminal pronatriuretic peptide, pulmonary disease, etc. were the most significant factors of adverse outcomes in patients with HF.
The combination of SMOTE+ENN and advanced machine learning methods effectively improved the discrimination efficacy of adverse outcomes in HF patients, accurately stratified patients at risk of adverse outcomes, and found the top factors of adverse outcomes. These models and factors emphasize the importance of health status data in determining adverse outcomes in patients with HF.
本研究旨在开发对心力衰竭(HF)患者不良结局具有良好识别能力的模型,并找出影响预后的重要因素。
共选取5004例符合条件的病例,其中498例出现不良结局,4506例病情好转后出院。研究对象为2014年1月至2019年6月期间,来自中国山西省某地区心血管医院及某医科大学附属医院心内科,被诊断为HF的住院患者。采用合成少数过采样技术结合编辑最近邻法(SMOTE+ENN)对不平衡数据进行预处理。使用传统逻辑回归(LR)、k近邻(KNN)、支持向量机(SVM)、随机森林(RF)和极端梯度提升(XGBoost)构建风险识别模型,每个模型重复100次。使用F1分数、受试者操作特征曲线下面积(AUROC)和布里尔分数评估模型的辨别力和校准度。采用五个模型中表现最佳的模型识别不良结局风险并评估影响因素。
SME-XGBoost是表现最佳的模型,其F1分数均值为0.3673(95%置信区间[CI]:0.3633 - 0.3712),AUC为0.8010(CI:0.7974 - 0.8046),布里尔分数为0.1769(CI:0.1748 - 0.1789)。年龄、N末端脑钠肽前体、肺部疾病等是HF患者不良结局的最显著因素。
SMOTE+ENN与先进机器学习方法的结合有效提高了HF患者不良结局的辨别效能,准确地对有不良结局风险的患者进行了分层,并找出了不良结局的首要因素。这些模型和因素强调了健康状况数据在确定HF患者不良结局中的重要性。