Suppr超能文献

基于机器学习的慢性心力衰竭发生风险因素分析及预测模型构建:健康生态学研究

Machine Learning-Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study.

作者信息

Xu Qian, Cai Xue, Yu Ruicong, Zheng Yueyue, Chen Guanjie, Sun Hui, Gao Tianyun, Xu Cuirong, Sun Jing

机构信息

School of Medicine, Southeast University, Nanjing, China.

Department of Respiratory and Critical Care, Zhongda Hospital Southeast University, Nanjing, China.

出版信息

JMIR Med Inform. 2025 Jan 31;13:e64972. doi: 10.2196/64972.

Abstract

BACKGROUND

Chronic heart failure (CHF) is a serious threat to human health, with high morbidity and mortality rates, imposing a heavy burden on the health care system and society. With the abundance of medical data and the rapid development of machine learning (ML) technologies, new opportunities are provided for in-depth investigation of the mechanisms of CHF and the construction of predictive models. The introduction of health ecology research methodology enables a comprehensive dissection of CHF risk factors from a wider range of environmental, social, and individual factors. This not only helps to identify high-risk groups at an early stage but also provides a scientific basis for the development of precise prevention and intervention strategies.

OBJECTIVE

This study aims to use ML to construct a predictive model of the risk of occurrence of CHF and analyze the risk of CHF from a health ecology perspective.

METHODS

This study sourced data from the Jackson Heart Study database. Stringent data preprocessing procedures were implemented, which included meticulous management of missing values and the standardization of data. Principal component analysis and random forest (RF) were used as feature selection techniques. Subsequently, several ML models, namely decision tree, RF, extreme gradient boosting, adaptive boosting (AdaBoost), support vector machine, naive Bayes model, multilayer perceptron, and bootstrap forest, were constructed, and their performance was evaluated. The effectiveness of the models was validated through internal validation using a 10-fold cross-validation approach on the training and validation sets. In addition, the performance metrics of each model, including accuracy, precision, sensitivity, F-score, and area under the curve (AUC), were compared. After selecting the best model, we used hyperparameter optimization to construct a better model.

RESULTS

RF-selected features (21 in total) had an average root mean square error of 0.30, outperforming principal component analysis. Synthetic Minority Oversampling Technique and Edited Nearest Neighbors showed better accuracy in data balancing. The AdaBoost model was most effective with an AUC of 0.86, accuracy of 75.30%, precision of 0.86, sensitivity of 0.69, and F-score of 0.76. Validation on the training and validation sets through 10-fold cross-validation gave an AUC of 0.97, an accuracy of 91.27%, a precision of 0.94, a sensitivity of 0.92, and an F-score of 0.94. After random search processing, the accuracy and AUC of AdaBoost improved. Its accuracy was 77.68% and its AUC was 0.86.

CONCLUSIONS

This study offered insights into CHF risk prediction. Future research should focus on prospective studies, diverse data, advanced techniques, longitudinal studies, and exploring factor interactions for better CHF prevention and management.

摘要

背景

慢性心力衰竭(CHF)对人类健康构成严重威胁,发病率和死亡率高,给医疗保健系统和社会带来沉重负担。随着医学数据的丰富和机器学习(ML)技术的快速发展,为深入研究CHF的发病机制和构建预测模型提供了新的机遇。引入健康生态学研究方法能够从更广泛的环境、社会和个体因素中全面剖析CHF的危险因素。这不仅有助于早期识别高危人群,还为制定精准的预防和干预策略提供科学依据。

目的

本研究旨在利用ML构建CHF发病风险的预测模型,并从健康生态学角度分析CHF的风险。

方法

本研究从杰克逊心脏研究数据库中获取数据。实施了严格的数据预处理程序,包括对缺失值的精细管理和数据的标准化。主成分分析和随机森林(RF)被用作特征选择技术。随后,构建了几种ML模型,即决策树、RF、极端梯度提升、自适应提升(AdaBoost)、支持向量机、朴素贝叶斯模型、多层感知器和自助森林,并对其性能进行了评估。通过在训练集和验证集上使用10折交叉验证方法进行内部验证,验证了模型的有效性。此外,比较了每个模型的性能指标,包括准确率、精确率、灵敏度、F分数和曲线下面积(AUC)。在选择最佳模型后,我们使用超参数优化构建了一个更好的模型。

结果

RF选择的特征(共21个)平均均方根误差为0.30,优于主成分分析。合成少数过采样技术和编辑最近邻在数据平衡方面表现出更好的准确性。AdaBoost模型最有效,AUC为0.86,准确率为75.30%,精确率为0.86,灵敏度为0.69,F分数为0.76。通过10折交叉验证在训练集和验证集上进行验证,得到的AUC为0.97,准确率为91.27%,精确率为0.94,灵敏度为0.92,F分数为0.94。经过随机搜索处理后,AdaBoost的准确率和AUC有所提高。其准确率为77.68%,AUC为0.86。

结论

本研究为CHF风险预测提供了见解。未来的研究应侧重于前瞻性研究、多样化数据、先进技术、纵向研究以及探索因素相互作用,以更好地预防和管理CHF。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8db/11829185/ac09a3a911ac/medinform_v13i1e64972_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验