Awe Olushina Olawale, Mwangi Peter Njoroge, Goudoungou Samuel Kotva, Esho Ruth Victoria, Oyejide Olanrewaju Samuel
Statistical Learning Lab, Federal University of Bahia, Salvador, Brazil.
Department of Data Science, African Institute for Mathematical Sciences (AIMS), Limbe, Cameroon.
BMC Med Inform Decis Mak. 2025 Apr 11;25(1):162. doi: 10.1186/s12911-025-02874-3.
Malaria, an infectious disease caused by protozoan parasites belonging to the Plasmodium genus, remains a significant public health challenge, with African regions bearing the heaviest burden. Machine learning techniques have shown great promise in improving the diagnosis of infectious diseases, such as malaria.
This study aims to integrate ensemble machine learning models and Explainable Artificial Intelligence (XAI) frameworks to enhance the diagnosis accuracy of malaria.
The study utilized a dataset from the Federal Polytechnic Ilaro Medical Centre, Ilaro, Ogun State, Nigeria, which includes information from 337 patients aged between 3 and 77 years (180 females and 157 males) over a 4-week period. Ensemble methods, namely Random Forest, AdaBoost, Gradient Boost, XGBoost, and CatBoost, were employed after addressing class imbalance through oversampling techniques. Explainable AI techniques, such as LIME, Shapley Additive Explanations (SHAP) and Permutation Feature Importance, were utilized to enhance transparency and interpretability.
Among the ensemble models, Random Forest demonstrated the highest performance with an ROC AUC score of 0.869, followed closely by CatBoost at 0.787. XGBoost, Gradient Boost, and AdaBoost achieved ROC AUC scores of 0.770, 0.747, and 0.633, respectively. These methods evaluated the influence of different characteristics on the probability of malaria diagnosis, revealing critical features that contribute to prediction outcomes.
By integrating ensemble machine learning models with explainable AI frameworks, the study promoted transparency in decision-making processes, thereby empowering healthcare providers with actionable insights for improved treatment strategies and enhanced patient outcomes, particularly in malaria management.
疟疾是一种由疟原虫属原生动物寄生虫引起的传染病,仍然是一项重大的公共卫生挑战,非洲地区负担最为沉重。机器学习技术在改善疟疾等传染病的诊断方面显示出巨大潜力。
本研究旨在整合集成机器学习模型和可解释人工智能(XAI)框架,以提高疟疾诊断的准确性。
该研究使用了来自尼日利亚奥贡州伊拉罗联邦理工学院医学中心的数据集,其中包括337名年龄在3至77岁之间的患者(180名女性和157名男性)在4周内的信息。在通过过采样技术解决类别不平衡问题后,采用了随机森林、自适应增强、梯度提升、极端梯度提升和类别提升等集成方法。利用局部可解释模型无关解释(LIME)、夏普利值(SHAP)和排列特征重要性等可解释人工智能技术来提高透明度和可解释性。
在集成模型中,随机森林表现最佳,ROC曲线下面积(AUC)得分为0.869,紧随其后的是类别提升,得分为0.787。极端梯度提升、梯度提升和自适应增强的ROC AUC得分分别为0.770、0.747和0.633。这些方法评估了不同特征对疟疾诊断概率的影响,揭示了有助于预测结果的关键特征。
通过将集成机器学习模型与可解释人工智能框架相结合,本研究提高了决策过程的透明度,从而为医疗保健提供者提供了可采取行动的见解,以改进治疗策略并改善患者预后,特别是在疟疾管理方面。