Suppr超能文献

一种用于增强中风预测的集成机器学习和数据挖掘方法。

An Ensemble Machine Learning and Data Mining Approach to Enhance Stroke Prediction.

作者信息

Wijaya Richard, Saeed Faisal, Samimi Parnia, Albarrak Abdullah M, Qasem Sultan Noman

机构信息

College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK.

Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia.

出版信息

Bioengineering (Basel). 2024 Jul 2;11(7):672. doi: 10.3390/bioengineering11070672.

Abstract

Stroke poses a significant health threat, affecting millions annually. Early and precise prediction is crucial to providing effective preventive healthcare interventions. This study applied an ensemble machine learning and data mining approach to enhance the effectiveness of stroke prediction. By employing the cross-industry standard process for data mining (CRISP-DM) methodology, various techniques, including random forest, ExtraTrees, XGBoost, artificial neural network (ANN), and genetic algorithm with ANN (GANN) were applied on two benchmark datasets to predict stroke based on several parameters, such as gender, age, various diseases, smoking status, BMI, HighCol, physical activity, hypertension, heart disease, lifestyle, and others. Due to dataset imbalance, Synthetic Minority Oversampling Technique (SMOTE) was applied to the datasets. Hyperparameter tuning optimized the models via grid search and randomized search cross-validation. The evaluation metrics included accuracy, precision, recall, F1-score, and area under the curve (AUC). The experimental results show that the ensemble ExtraTrees classifier achieved the highest accuracy (98.24%) and AUC (98.24%). Random forest also performed well, achieving 98.03% in both accuracy and AUC. Comparisons with state-of-the-art stroke prediction methods revealed that the proposed approach demonstrates superior performance, indicating its potential as a promising method for stroke prediction and offering substantial benefits to healthcare.

摘要

中风对健康构成重大威胁,每年影响数百万人。早期准确预测对于提供有效的预防性医疗干预至关重要。本研究应用集成机器学习和数据挖掘方法来提高中风预测的有效性。通过采用跨行业数据挖掘标准流程(CRISP-DM)方法,将包括随机森林、极端随机树、XGBoost、人工神经网络(ANN)以及带有ANN的遗传算法(GANN)等各种技术应用于两个基准数据集,以基于性别、年龄、各种疾病、吸烟状况、体重指数、总胆固醇、身体活动、高血压、心脏病、生活方式等多个参数预测中风。由于数据集不平衡,对数据集应用了合成少数过采样技术(SMOTE)。超参数调整通过网格搜索和随机搜索交叉验证对模型进行了优化。评估指标包括准确率、精确率、召回率、F1分数和曲线下面积(AUC)。实验结果表明,集成极端随机树分类器实现了最高准确率(98.24%)和AUC(98.24%)。随机森林也表现良好,准确率和AUC均达到98.03%。与现有最先进的中风预测方法的比较表明,所提出的方法表现出卓越性能,表明其作为一种有前景的中风预测方法的潜力,并为医疗保健带来巨大益处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caa9/11274138/0acbdd46beab/bioengineering-11-00672-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验