利用机器学习技术预测冈比亚的疟疾疫情。

Predicting malaria outbreak in The Gambia using machine learning techniques.

机构信息

Department of Mathematics, College of Computing and Mathematics, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia.

Interdisciplinary Research Center for Refining & Advanced Chemicals, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia.

出版信息

PLoS One. 2024 May 16;19(5):e0299386. doi: 10.1371/journal.pone.0299386. eCollection 2024.

DOI:10.1371/journal.pone.0299386

PMID:38753678

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11098333/

Abstract

Malaria is the most common cause of death among the parasitic diseases. Malaria continues to pose a growing threat to the public health and economic growth of nations in the tropical and subtropical parts of the world. This study aims to address this challenge by developing a predictive model for malaria outbreaks in each district of The Gambia, leveraging historical meteorological data. To achieve this objective, we employ and compare the performance of eight machine learning algorithms, including C5.0 decision trees, artificial neural networks, k-nearest neighbors, support vector machines with linear and radial kernels, logistic regression, extreme gradient boosting, and random forests. The models are evaluated using 10-fold cross-validation during the training phase, repeated five times to ensure robust validation. Our findings reveal that extreme gradient boosting and decision trees exhibit the highest prediction accuracy on the testing set, achieving 93.3% accuracy, followed closely by random forests with 91.5% accuracy. In contrast, the support vector machine with a linear kernel performs less favorably, showing a prediction accuracy of 84.8% and underperforming in specificity analysis. Notably, the integration of both climatic and non-climatic features proves to be a crucial factor in accurately predicting malaria outbreaks in The Gambia.

摘要

疟疾是寄生虫病中最常见的死因。疟疾继续对世界热带和亚热带地区国家的公共卫生和经济增长构成越来越大的威胁。本研究旨在通过利用历史气象数据，为冈比亚每个地区的疟疾爆发建立预测模型来应对这一挑战。为了实现这一目标，我们采用并比较了八种机器学习算法的性能，包括 C5.0 决策树、人工神经网络、k-最近邻、具有线性和径向核的支持向量机、逻辑回归、极端梯度提升和随机森林。在训练阶段，使用 10 折交叉验证评估模型，重复五次以确保稳健验证。我们的研究结果表明，极端梯度提升和决策树在测试集上表现出最高的预测准确性，达到 93.3%的准确性，紧随其后的是随机森林，准确性为 91.5%。相比之下，具有线性核的支持向量机表现不佳，预测准确性为 84.8%，在特异性分析中表现不佳。值得注意的是，将气候和非气候特征相结合被证明是准确预测冈比亚疟疾爆发的关键因素。