Jayaramu Veianthan, Zulkafli Zed, De Stercke Simon, Buytaert Wouter, Rahmat Fariq, Abdul Rahman Ribhan Zafira, Ishak Asnor Juraiza, Tahir Wardah, Ab Rahman Jamalludin, Mohd Fuzi Nik Mohd Hafiz
Department of Civil Engineering, Universiti Putra Malaysia, Serdang, Malaysia.
Department of Civil and Environmental Engineering, Imperial College London, London, UK.
Int J Biometeorol. 2023 Mar;67(3):423-437. doi: 10.1007/s00484-022-02422-y. Epub 2023 Jan 31.
Leptospirosis is a zoonosis that has been linked to hydrometeorological variability. Hydrometeorological averages and extremes have been used before as drivers in the statistical prediction of disease. However, their importance and predictive capacity are still little known. In this study, the use of a random forest classifier was explored to analyze the relative importance of hydrometeorological indices in developing the leptospirosis model and to evaluate the performance of models based on the type of indices used, using case data from three districts in Kelantan, Malaysia, that experience annual monsoonal rainfall and flooding. First, hydrometeorological data including rainfall, streamflow, water level, relative humidity, and temperature were transformed into 164 weekly average and extreme indices in accordance with the Expert Team on Climate Change Detection and Indices (ETCCDI). Then, weekly case occurrences were classified into binary classes "high" and "low" based on an average threshold. Seventeen models based on "average," "extreme," and "mixed" indices were trained by optimizing the feature subsets based on the model computed mean decrease Gini (MDG) scores. The variable importance was assessed through cross-correlation analysis and the MDG score. The average and extreme models showed similar prediction accuracy ranges (61.5-76.1% and 72.3-77.0%) while the mixed models showed an improvement (71.7-82.6% prediction accuracy). An extreme model was the most sensitive while an average model was the most specific. The time lag associated with the driving indices agreed with the seasonality of the monsoon. The rainfall variable (extreme) was the most important in classifying the leptospirosis occurrence while streamflow was the least important despite showing higher correlations with leptospirosis.
钩端螺旋体病是一种人畜共患病,与水文气象变化有关。水文气象平均值和极值以前曾被用作疾病统计预测的驱动因素。然而,它们的重要性和预测能力仍然鲜为人知。在本研究中,利用马来西亚吉兰丹三个地区的病例数据,这些地区每年都有季风降雨和洪水,探讨使用随机森林分类器来分析水文气象指数在建立钩端螺旋体病模型中的相对重要性,并根据所使用的指数类型评估模型的性能。首先,根据气候变化检测与指数专家小组(ETCCDI),将包括降雨量、流量、水位、相对湿度和温度在内的水文气象数据转换为164个每周平均值和极值指数。然后,根据平均阈值将每周病例发生情况分为“高”和“低”两个二元类别。基于“平均”、“极值”和“混合”指数的17个模型通过基于模型计算的平均基尼系数下降(MDG)分数优化特征子集进行训练。通过互相关分析和MDG分数评估变量重要性。平均模型和极值模型显示出相似的预测准确率范围(61.5-76.1%和72.3-77.0%),而混合模型显示出改进(预测准确率71.7-82.6%)。极值模型最敏感,而平均模型最具特异性。与驱动指数相关的时间滞后与季风的季节性一致。降雨变量(极值)在钩端螺旋体病发生分类中最重要,而流量尽管与钩端螺旋体病显示出较高的相关性,但最不重要。