Initiatives for Conservation, Landscape Ecology, Bioprospecting, and Biomodeling (ICOLABB), Research Center for the Natural and Applied Sciences, University of Santo Tomas, España, Manila 1008, Philippines.
Initiatives for Conservation, Landscape Ecology, Bioprospecting, and Biomodeling (ICOLABB), Research Center for the Natural and Applied Sciences, University of Santo Tomas, España, Manila 1008, Philippines; Department of Biological Sciences, College of Science, University of Santo Tomas, España, Manila 1008, Philippines; The Graduate School, University of Santo Tomas, España, Manila 1008, Philippines.
Acta Trop. 2024 Jul;255:107225. doi: 10.1016/j.actatropica.2024.107225. Epub 2024 May 1.
Previous dengue epidemiological analyses have been limited in spatiotemporal extent or covariate dimensions, the latter neglecting the multifactorial nature of dengue. These constraints, caused by rigid and traditional statistical tools which collapse amidst 'Big Data', prompt interpretable machine-learning (iML) approaches. Predicting dengue incidence and mortality in the Philippines, a data-limited yet high-burden country, the mlr3 universe of R packages was used to build and optimize ML models based on remotely sensed provincial and dekadal 3 NDVI and 9 rainfall features from 2016 to 2020. Between two tasks, models differ across four random forest-based learners and two clustering strategies. Among 16 candidates, rfsrc-year-case and ranger-year-death significantly perform best for predicting dengue incidence and mortality, respectively. Therefore, temporal clustering yields the best models, reflective of dengue seasonality. The two best models were subjected to tripartite global exploratory model analyses, which encompass model-agnostic post-hoc methods such as Permutation Feature Importance (PFI) and Accumulated Local Effects (ALE). PFI reveals that the models differ in their important explanatory aspect, rainfall for rfsrc-year-case and NDVI for ranger-year-death, among which long-term average (lta) features are most relevant. Trend-wise, ALE reveals that average incidence predictions are positively associated with 'Rain.lta', reflective of dengue cases peaking during the wet season. In contrast, those for mortality are negatively associated with 'NDVI.lta', reflective of urban spaces driving dengue-related deaths. By technologically addressing the challenges of the human-animal-ecosystem interface, this study adheres to the One Digital Health paradigm operationalized under Sustainable Development Goals (SDGs). Leveraging data digitization and predictive modeling for epidemiological research paves SDG 3, which prioritizes holistic health and well-being.
先前的登革热流行病学分析在时空范围或协变量维度上受到限制,后者忽略了登革热的多因素性质。这些限制是由僵化和传统的统计工具造成的,这些工具在“大数据”中崩溃,促使可解释的机器学习(iML)方法的出现。为了预测菲律宾的登革热发病率和死亡率,该研究使用了 R 包中的 mlr3 宇宙来构建和优化基于机器学习的模型,这些模型基于 2016 年至 2020 年从远程感应省级和每 10 天的 3 个 NDVI 和 9 个降雨特征中提取的特征。在两项任务中,模型在四个基于随机森林的学习者和两种聚类策略之间有所不同。在 16 个候选模型中,rfsrc-year-case 和 ranger-year-death 分别在预测登革热发病率和死亡率方面表现最佳。因此,时间聚类产生了最佳模型,反映了登革热的季节性。对这两个最佳模型进行了三方全球探索性模型分析,其中包括模型不可知的事后分析方法,如排列特征重要性(PFI)和累积局部效应(ALE)。PFI 表明,模型在其重要的解释方面有所不同,对于 rfsrc-year-case 来说是降雨,对于 ranger-year-death 来说是 NDVI,其中长期平均(lta)特征最为相关。趋势分析表明,平均发病率预测与“Rain.lta”呈正相关,反映了登革热病例在雨季达到高峰。相反,死亡率预测与“NDVI.lta”呈负相关,反映了城市空间对登革热相关死亡的驱动作用。通过技术手段解决人与动物生态系统界面的挑战,本研究符合可持续发展目标(SDG)下的一个数字健康范例。利用数据数字化和预测建模进行流行病学研究,优先考虑了整体健康和福祉,这符合可持续发展目标 3 的要求。