Rahman Md Siddikur, Amrin Miftahuzzannat, Bokkor Shiddik Md Abu
Department of Statistics Begum Rokeya University Rangpur Bangladesh.
Health Sci Rep. 2025 May 9;8(5):e70726. doi: 10.1002/hsr2.70726. eCollection 2025 May.
A life-threatening vector-borne disease, dengue fever (DF), poses significant global public health and economic threats, including Bangladesh. Determining dengue risk factors is crucial for early warning systems to forecast disease epidemics and develop efficient control strategies. To address this, we propose an interpretable tree-based machine learning (ML) model for dengue early warning systems and outbreak prediction in Bangladesh based on climatic, sociodemographic, and landscape factors.
A framework for forecasting DF risk was developed by using high-performance ML algorithms, namely Random Forests, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), based on sociodemographic, climate, landscape, and dengue surveillance epidemiological data (January 2000 to December 2021). The optimal tree-based ML model with strong interpretability was created by comparing various ML models using the hyperparameter optimization technique. The feature importance ranking and the most significant dengue driver were found using the SHapley Additive explanation (SHAP) value.
Our study findings detected a nonlinear effect of climatic parameters on dengue at different thresholds such as mean (27°C), minimum (22°C), maximum temperatures (32°C), and relative humidity (82%). The optimal minimum and maximum temperatures, humidity, rainfall, and wind speed for dengue risk are 25-28°C, 32-34°C, 75%-85%, 10 mm, and 12 m/s, respectively. The LightGBM model accurately forecasts DF and agricultural land, population density, and minimum temperature significantly affecting the dengue outbreak in Bangladesh.
Our proposed ML model functions as an early warning system, improving comprehension of the factors that precipitate dengue outbreaks and providing a framework for sophisticated analytical techniques in public health.
登革热(DF)是一种威胁生命的媒介传播疾病,对包括孟加拉国在内的全球公共卫生和经济构成重大威胁。确定登革热风险因素对于预测疾病流行的早期预警系统以及制定有效的控制策略至关重要。为解决这一问题,我们基于气候、社会人口统计学和景观因素,为孟加拉国的登革热早期预警系统和疫情预测提出了一种可解释的基于树的机器学习(ML)模型。
基于社会人口统计学、气候、景观和登革热监测流行病学数据(2000年1月至2021年12月),使用高性能ML算法,即随机森林、极端梯度提升(XGBoost)和轻梯度提升机(LightGBM),开发了一个预测登革热风险的框架。通过使用超参数优化技术比较各种ML模型,创建了具有强解释性的最优基于树的ML模型。使用夏普利加法解释(SHAP)值来找到特征重要性排名和最重要的登革热驱动因素。
我们的研究结果在不同阈值(如平均温度(27°C)、最低温度(22°C)、最高温度(32°C)和相对湿度(82%))下检测到气候参数对登革热的非线性影响。登革热风险的最佳最低和最高温度、湿度、降雨量和风速分别为25 - 28°C、32 - 34°C、75% - 85%、10毫米和12米/秒。LightGBM模型准确预测了登革热,并且农业用地、人口密度和最低温度对孟加拉国的登革热疫情有显著影响。
我们提出的ML模型作为一个早期预警系统,有助于更好地理解引发登革热疫情的因素,并为公共卫生中的复杂分析技术提供了一个框架。