使用SMOTE+ENN和机器学习改善慢性心力衰竭不良结局的风险识别

Improving Risk Identification of Adverse Outcomes in Chronic Heart Failure Using SMOTE+ENN and Machine Learning.

作者信息

Wang Ke, Tian Jing, Zheng Chu, Yang Hong, Ren Jia, Li Chenhao, Han Qinghua, Zhang Yanbo

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China.

Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People's Republic of China.

出版信息

Risk Manag Healthc Policy. 2021 Jun 8;14:2453-2463. doi: 10.2147/RMHP.S310295. eCollection 2021.

DOI:10.2147/RMHP.S310295

PMID:34149290

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8206455/

Abstract

PURPOSE

This study sought to develop models with good identification for adverse outcomes in patients with heart failure (HF) and find strong factors that affect prognosis.

PATIENTS AND METHODS

A total of 5004 qualifying cases were selected, among which 498 cases had adverse outcomes and 4506 cases were discharged after improvement. The study subjects were hospitalized patients diagnosed with HF from a regional cardiovascular hospital and the cardiology department of a medical university hospital in Shanxi Province of China between January 2014 and June 2019. Synthesizing minority oversampling technology combined with edited nearest neighbors (SMOTE+ENN) was used to pre-process unbalanced data. Traditional logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were used to build risk identification models, and each model was repeated 100 times. Model discrimination and calibration were estimated using F1-score, the area under the receiver-operating characteristic curve (AUROC), and Brier score. The best performing of the five models was used to identify the risk of adverse outcomes and evaluate the influencing factors.

RESULTS

The SME-XGBoost was the best performing model with means of F1-score (0.3673, 95% confidence interval [CI]: 0.3633-0.3712), AUC (0.8010, CI: 0.7974-0.8046), and Brier score (0.1769, CI: 0.1748-0.1789). Age, N-terminal pronatriuretic peptide, pulmonary disease, etc. were the most significant factors of adverse outcomes in patients with HF.

CONCLUSION

The combination of SMOTE+ENN and advanced machine learning methods effectively improved the discrimination efficacy of adverse outcomes in HF patients, accurately stratified patients at risk of adverse outcomes, and found the top factors of adverse outcomes. These models and factors emphasize the importance of health status data in determining adverse outcomes in patients with HF.

摘要

目的

本研究旨在开发对心力衰竭（HF）患者不良结局具有良好识别能力的模型，并找出影响预后的重要因素。

患者与方法

共选取5004例符合条件的病例，其中498例出现不良结局，4506例病情好转后出院。研究对象为2014年1月至2019年6月期间，来自中国山西省某地区心血管医院及某医科大学附属医院心内科，被诊断为HF的住院患者。采用合成少数过采样技术结合编辑最近邻法（SMOTE+ENN）对不平衡数据进行预处理。使用传统逻辑回归（LR）、k近邻（KNN）、支持向量机（SVM）、随机森林（RF）和极端梯度提升（XGBoost）构建风险识别模型，每个模型重复100次。使用F1分数、受试者操作特征曲线下面积（AUROC）和布里尔分数评估模型的辨别力和校准度。采用五个模型中表现最佳的模型识别不良结局风险并评估影响因素。

结果

SME-XGBoost是表现最佳的模型，其F1分数均值为0.3673（95%置信区间[CI]：0.3633 - 0.3712），AUC为0.8010（CI：0.7974 - 0.8046），布里尔分数为0.1769（CI：0.1748 - 0.1789）。年龄、N末端脑钠肽前体、肺部疾病等是HF患者不良结局的最显著因素。

结论

SMOTE+ENN与先进机器学习方法的结合有效提高了HF患者不良结局的辨别效能，准确地对有不良结局风险的患者进行了分层，并找出了不良结局的首要因素。这些模型和因素强调了健康状况数据在确定HF患者不良结局中的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8e0/8206455/62fc3153b6dd/RMHP-14-2453-g0001.jpg

相似文献

Improving Risk Identification of Adverse Outcomes in Chronic Heart Failure Using SMOTE+ENN and Machine Learning.使用SMOTE+ENN和机器学习改善慢性心力衰竭不良结局的风险识别

Risk Manag Healthc Policy. 2021 Jun 8;14:2453-2463. doi: 10.2147/RMHP.S310295. eCollection 2021.

[Construction of a predictive model for in-hospital mortality of sepsis patients in intensive care unit based on machine learning].基于机器学习构建重症监护病房脓毒症患者院内死亡率预测模型

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2023 Jul;35(7):696-701. doi: 10.3760/cma.j.cn121430-20221219-01104.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

Identification of Orphan Genes in Unbalanced Datasets Based on Ensemble Learning.基于集成学习的不平衡数据集中孤儿基因的识别

Front Genet. 2020 Oct 2;11:820. doi: 10.3389/fgene.2020.00820. eCollection 2020.

Machine learning-based risk prediction of malignant arrhythmia in hospitalized patients with heart failure.基于机器学习的心力衰竭住院患者恶性心律失常风险预测。

ESC Heart Fail. 2021 Dec;8(6):5363-5371. doi: 10.1002/ehf2.13627. Epub 2021 Sep 28.

Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation.可解释机器学习技术预测胺碘酮诱导甲状腺功能障碍风险：多中心回顾性研究及外部验证。

J Med Internet Res. 2023 Feb 7;25:e43734. doi: 10.2196/43734.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型：机器学习研究。

J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.

Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques.使用机器学习和 SMOTE-ENN 技术预测听神经瘤患者的术后恢复情况。

Math Biosci Eng. 2022 Jul 22;19(10):10407-10423. doi: 10.3934/mbe.2022487.

Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测：比较不同基于 SMOTE 的机器学习算法。

BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.

引用本文的文献

Interpretable Machine Learning Models for Predicting Malignant Ventricular Arrhythmia in Patients with Acute ST-Segment Elevation Myocardial Infarction Based on Systemic Inflammation Index.基于全身炎症指标的急性ST段抬高型心肌梗死患者恶性室性心律失常预测的可解释机器学习模型

Clin Appl Thromb Hemost. 2025 Jan-Dec;31:10760296251375795. doi: 10.1177/10760296251375795. Epub 2025 Sep 1.

Establishment and comparison of prediction models for early-stage diabetic kidney disease.早期糖尿病肾病预测模型的建立与比较

Digit Health. 2025 Jun 27;11:20552076251355448. doi: 10.1177/20552076251355448. eCollection 2025 Jan-Dec.

Comparative performance of twelve machine learning models in predicting COVID-19 mortality risk in children: a population-based retrospective cohort study in Brazil.十二种机器学习模型预测儿童新冠病毒疾病死亡率风险的比较性能：巴西一项基于人群的回顾性队列研究

PeerJ Comput Sci. 2025 May 28;11:e2916. doi: 10.7717/peerj-cs.2916. eCollection 2025.

Enhanced Multi-Model Machine Learning-Based Dementia Detection Using a Data Enrichment Framework: Leveraging the Blessing of Dimensionality.使用数据丰富框架的基于增强多模型机器学习的痴呆症检测：利用维度的优势

Bioengineering (Basel). 2025 May 30;12(6):592. doi: 10.3390/bioengineering12060592.

A Convolutional Mixer-Based Deep Learning Network for Alzheimer's Disease Classification from Structural Magnetic Resonance Imaging.一种基于卷积混合器的深度学习网络，用于从结构磁共振成像中进行阿尔茨海默病分类。

Diagnostics (Basel). 2025 May 23;15(11):1318. doi: 10.3390/diagnostics15111318.

Sarcopenia prediction model based on machine learning and SHAP values for community-based older adults with cardiovascular disease in China.基于机器学习和SHAP值的中国社区心血管疾病老年患者肌肉减少症预测模型

Front Public Health. 2025 May 21;13:1527304. doi: 10.3389/fpubh.2025.1527304. eCollection 2025.

Development and validation of interpretable machine learning models to predict distant metastasis and prognosis of muscle-invasive bladder cancer patients.用于预测肌层浸润性膀胱癌患者远处转移和预后的可解释机器学习模型的开发与验证

Sci Rep. 2025 Apr 6;15(1):11795. doi: 10.1038/s41598-025-96089-1.

Development and validation of an interpretable machine learning model for predicting the risk of hepatocellular carcinoma in patients with chronic hepatitis B: a case-control study.用于预测慢性乙型肝炎患者肝细胞癌风险的可解释机器学习模型的开发与验证：一项病例对照研究

BMC Gastroenterol. 2025 Mar 11;25(1):157. doi: 10.1186/s12876-025-03697-2.

Predictive Analytics in Heart Failure Risk, Readmission, and Mortality Prediction: A Review.心力衰竭风险、再入院和死亡率预测中的预测分析：综述

Cureus. 2024 Nov 17;16(11):e73876. doi: 10.7759/cureus.73876. eCollection 2024 Nov.

Machine learning for predicting in-hospital mortality in elderly patients with heart failure combined with hypertension: a multicenter retrospective study.机器学习预测老年心力衰竭合并高血压患者住院死亡率：一项多中心回顾性研究。

Cardiovasc Diabetol. 2024 Nov 15;23(1):407. doi: 10.1186/s12933-024-02503-9.

本文引用的文献

Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan.运用机器学习方法预测危重症流感患者的死亡率：台湾一项跨中心回顾性研究

BMJ Open. 2020 Feb 25;10(2):e033898. doi: 10.1136/bmjopen-2019-033898.

Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone.机器学习仅通过血清肌酐和射血分数即可预测心力衰竭患者的生存情况。

BMC Med Inform Decis Mak. 2020 Feb 3;20(1):16. doi: 10.1186/s12911-020-1023-5.

Development and Internal Validation of Machine Learning Algorithms for Preoperative Survival Prediction of Extremity Metastatic Disease.开发和内部验证用于预测肢体转移性疾病术前生存的机器学习算法。

Clin Orthop Relat Res. 2020 Feb;478(2):322-333. doi: 10.1097/CORR.0000000000000997.

Machine Learning Prediction of Mortality and Hospitalization in Heart Failure With Preserved Ejection Fraction.机器学习预测射血分数保留的心力衰竭患者的死亡率和住院率。

JACC Heart Fail. 2020 Jan;8(1):12-21. doi: 10.1016/j.jchf.2019.06.013. Epub 2019 Oct 9.

Efficient partition of integer optimization problems with one-hot encoding.使用独热编码对整数优化问题进行高效划分。

Sci Rep. 2019 Sep 10;9(1):13036. doi: 10.1038/s41598-019-49539-6.

XGBoost Model for Chronic Kidney Disease Diagnosis.XGBoost 模型用于慢性肾脏病诊断。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2131-2140. doi: 10.1109/TCBB.2019.2911071. Epub 2020 Dec 8.

Predicting diabetes-related hospitalizations based on electronic health records.基于电子健康记录预测糖尿病相关住院情况。

Stat Methods Med Res. 2019 Dec;28(12):3667-3682. doi: 10.1177/0962280218810911. Epub 2018 Nov 25.

Predicting Diabetes Mellitus With Machine Learning Techniques.运用机器学习技术预测糖尿病

Front Genet. 2018 Nov 6;9:515. doi: 10.3389/fgene.2018.00515. eCollection 2018.

[Chinese guidelines for the diagnosis and treatment of heart failure 2018].《中国心力衰竭诊断和治疗指南2018》

Zhonghua Xin Xue Guan Bing Za Zhi. 2018 Oct 24;46(10):760-789. doi: 10.3760/cma.j.issn.0253-3758.2018.10.004.

Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association.《2018年心脏病和中风统计数据更新：美国心脏协会报告》

Circulation. 2018 Mar 20;137(12):e67-e492. doi: 10.1161/CIR.0000000000000558. Epub 2018 Jan 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用SMOTE+ENN和机器学习改善慢性心力衰竭不良结局的风险识别

Improving Risk Identification of Adverse Outcomes in Chronic Heart Failure Using SMOTE+ENN and Machine Learning.

作者信息

机构信息

出版信息

PURPOSE

PATIENTS AND METHODS

RESULTS

CONCLUSION

目的

患者与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献