College of Veterinary Medicine, Chungbuk National University, Cheongju, Korea.
Division of Infectious Diseases, Department of Internal Medicine, College of Medicine, Soonchunhyang University, Asan, Korea.
J Korean Med Sci. 2024 Jun 10;39(22):e176. doi: 10.3346/jkms.2024.39.e176.
Malaria elimination strategies in the Republic of Korea (ROK) have decreased malaria incidence but face challenges due to delayed case detection and response. To improve this, machine learning models for predicting malaria, focusing on high-risk areas, have been developed.
The study targeted the northern region of ROK, near the demilitarized zone, using a 1-km grid to identify areas for prediction. Grid cells without residential buildings were excluded, leaving 8,425 cells. The prediction was based on whether at least one malaria case was reported in each grid cell per month, using spatial data of patient locations. Four algorithms were used: gradient boosted (GBM), generalized linear (GLM), extreme gradient boosted (XGB), and ensemble models, incorporating environmental, sociodemographic, and meteorological data as predictors. The models were trained with data from May to October (2019-2021) and tested with data from May to October 2022. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC).
The AUROC of the prediction models performed excellently (GBM = 0.9243, GLM = 0.9060, XGB = 0.9180, and ensemble model = 0.9301). Previous malaria risk, population size, and meteorological factors influenced the model most in GBM and XGB.
Machine-learning models with properly preprocessed malaria case data can provide reliable predictions. Additional predictors, such as mosquito density, should be included in future studies to improve the performance of models.
韩国(ROK)的疟疾消除策略降低了疟疾发病率,但由于病例检测和应对延迟,仍面临挑战。为了改善这一状况,已经开发了针对高风险地区的疟疾预测机器学习模型。
该研究针对 ROK 北部靠近非军事区的地区,使用 1 公里的网格来识别预测区域。排除没有住宅建筑的网格单元,留下 8425 个单元。预测是基于每个网格单元每月是否至少报告了一例疟疾病例,使用患者位置的空间数据。使用了四种算法:梯度提升(GBM)、广义线性(GLM)、极端梯度提升(XGB)和集成模型,将环境、社会人口统计学和气象数据作为预测因子。模型使用 2019 年至 2021 年 5 月至 10 月的数据进行训练,并使用 2022 年 5 月至 10 月的数据进行测试。使用接收者操作特征曲线下的面积(AUROC)评估模型性能。
预测模型的 AUROC 表现出色(GBM = 0.9243,GLM = 0.9060,XGB = 0.9180,集成模型 = 0.9301)。在 GBM 和 XGB 中,先前的疟疾风险、人口规模和气象因素对模型的影响最大。
使用经过适当预处理的疟疾病例数据的机器学习模型可以提供可靠的预测。在未来的研究中,应包括蚊密度等额外的预测因子,以提高模型的性能。