Melese Tadele, Assefa Gizachew, Terefe Baye, Belay Tatek, Bayable Getachew, Senamew Abebe
Department of Natural Resource Management, College of Agriculture and Environmental Science, Bahir Dar University, Bahir Dar, Ethiopia.
Department of Information System science, Faculty of Science and Engineering, Soka University, Tokyo, Japan.
PLoS One. 2025 Jun 18;20(6):e0326174. doi: 10.1371/journal.pone.0326174. eCollection 2025.
Accurate drought prediction is essential for proactive water management and agricultural planning, especially in regions like Ethiopia that are highly susceptible to climate variability. This study investigates the classification of the Palmer Drought Severity Index (PDSI) using machine learning models trained on TerraClimate data, incorporating variables such as precipitation, temperature, soil moisture, and vapor pressure deficit. We employed several classifiers Logistic Regression, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Decision Tree, Random Forest, Gradient Boosting, Naive Bayes, AdaBoost, and XGBoost with Logistic Regression serving as a baseline statistical approach for comparison. To address data imbalance across drought classes, we applied a hybrid resampling method combining manual upsampling and SMOTE. Hyperparameter tuning was conducted using grid search and cross-validation. Random Forest outperformed all models, achieving an accuracy of 71.18%, F1-score of 0.71, and ROC AUC of 0.9000. Gradient Boosting and SVM also performed well with ROC AUC values of 0.8982 and 0.8681, respectively. SHAP analysis revealed that soil moisture, precipitation, and vapor pressure deficit were the most influential features in predicting drought severity. For benchmarking, an ARIMA (3,1,2) time-series model was applied but yielded poor performance (RMSE = 1.789, R² = -0.077), confirming the advantages of non-linear machine learning techniques for complex climate data. The results highlight the utility of ensemble learning in environmental modelling, offering valuable insights for drought early warning systems and climate resilience planning in Ethiopia. Future work should explore integrating localized predictors and real-time data to enhance prediction robustness.
准确的干旱预测对于积极主动的水资源管理和农业规划至关重要,尤其是在像埃塞俄比亚这样极易受到气候变化影响的地区。本研究利用在TerraClimate数据上训练的机器学习模型,对帕尔默干旱严重程度指数(PDSI)进行分类,纳入了降水量、温度、土壤湿度和水汽压亏缺等变量。我们采用了几种分类器,逻辑回归、支持向量机(SVM)、k近邻(KNN)、决策树、随机森林、梯度提升、朴素贝叶斯、自适应增强(AdaBoost)和极端梯度提升(XGBoost),并将逻辑回归作为基线统计方法用于比较。为了解决干旱类别间的数据不平衡问题,我们应用了一种结合手动上采样和合成少数过采样技术(SMOTE)的混合重采样方法。使用网格搜索和交叉验证进行超参数调整。随机森林的表现优于所有模型,准确率达到71.18%,F1分数为0.71,ROC曲线下面积(ROC AUC)为0.9000。梯度提升和支持向量机的表现也不错,ROC AUC值分别为0.8982和0.8681。SHAP分析表明,土壤湿度、降水量和水汽压亏缺是预测干旱严重程度最具影响力的特征。为了进行基准测试,应用了自回归积分滑动平均(ARIMA)(3,1,2)时间序列模型,但性能较差(均方根误差RMSE = 1.789,决定系数R² = -0.077),这证实了非线性机器学习技术处理复杂气候数据的优势。结果突出了集成学习在环境建模中的效用,为埃塞俄比亚的干旱预警系统和气候适应能力规划提供了有价值的见解。未来的工作应探索整合本地化预测因子和实时数据,以提高预测的稳健性。