Abousaber Inam
Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 47912, Saudi Arabia.
Sensors (Basel). 2025 Mar 11;25(6):1739. doi: 10.3390/s25061739.
The accurate prediction of brain stroke is critical for effective diagnosis and management, yet the imbalanced nature of medical datasets often hampers the performance of conventional machine learning models. To address this challenge, we propose a novel meta-learning framework that integrates advanced hybrid resampling techniques, ensemble-based classifiers, and explainable artificial intelligence (XAI) to enhance predictive performance and interpretability. The framework employs SMOTE and SMOTEENN for handling class imbalance, dynamic feature selection to reduce noise, and a meta-learning approach combining predictions from Random Forest and LightGBM, and further refined by a deep learning-based meta-classifier. The model uses SHAP (Shapley Additive Explanations) to provide transparent insights into feature contributions, increasing trust in its predictions. Evaluated on three datasets, DF-1, DF-2, and DF-3, the proposed framework consistently outperformed state-of-the-art methods, achieving accuracy and F1-Score of 0.992189 and 0.992579 on DF-1, 0.980297 and 0.981916 on DF-2, and 0.981901 and 0.983365 on DF-3. These results validate the robustness and effectiveness of the approach, significantly improving the detection of minority-class instances while maintaining overall performance. This work establishes a reliable solution for stroke prediction and provides a foundation for applying meta-learning and explainable AI to other imbalanced medical prediction tasks.
准确预测脑中风对于有效诊断和治疗至关重要,然而医学数据集的不平衡特性常常阻碍传统机器学习模型的性能。为应对这一挑战,我们提出了一种新颖的元学习框架,该框架集成了先进的混合重采样技术、基于集成的分类器以及可解释人工智能(XAI),以提高预测性能和可解释性。该框架采用SMOTE和SMOTEENN来处理类别不平衡问题,通过动态特征选择来减少噪声,并采用一种结合随机森林和LightGBM预测结果的元学习方法,再由基于深度学习的元分类器进一步优化。该模型使用SHAP(Shapley值加法解释)来提供关于特征贡献的透明见解,增强对其预测的信任。在DF - 1、DF - 2和DF - 3这三个数据集上进行评估时,所提出的框架始终优于现有方法,在DF - 1上的准确率和F1分数分别达到0.992189和0.992579,在DF - 2上为0.980297和0.981916,在DF - 3上为0.981901和0.983365。这些结果验证了该方法的稳健性和有效性,在保持整体性能的同时显著提高了对少数类实例的检测能力。这项工作为中风预测建立了一个可靠的解决方案,并为将元学习和可解释人工智能应用于其他不平衡医学预测任务奠定了基础。