Morlighem Camille, Nnanatu Chibuzor Christopher, Visée Corentin, Fall Atoumane, Linard Catherine
Department of Geography, University of Namur, Namur, Belgium.
ILEE, University of Namur, Namur, Belgium.
PLoS One. 2025 May 29;20(5):e0322819. doi: 10.1371/journal.pone.0322819. eCollection 2025.
Accurate mapping and disaggregation of key health and demographic risk factors have become increasingly important for disease surveillance, which can reveal geographical social inequalities for improved health interventions and for monitoring progress on relevant Sustainable Development Goals (SDGs). Household surveys like the Demographic and Health Surveys have been widely used as a proxy for mapping SDG-related household characteristics. However, there is no consensus on the workflow to be used, and different methods have been implemented with varying complexities. This study aims to compare multiple modelling frameworks to model indicators of human vulnerability to malaria (SDG Target 3.3) in Senegal. These indicators were categorised into socioeconomic (e.g., stunting prevalence, wealth index) and malaria prevention indicators (e.g., indoor residual spraying, insecticide-treated net ownership). We compared three categories of the commonly used methods: (1) spatial interpolation methods (i.e., inverse distance weighting, thin plate splines, kriging), (2) ensemble methods (i.e., random forest), and (3) Bayesian geostatistical models. Most indicators could be modelled with medium to high predictive accuracy, with R2 values ranging from 0.40 to 0.86. No method or method category emerged as the best, but performance varied widely. Overall, socioeconomic indicators were generally better predicted by covariate-based models (e.g., random forest and Bayesian models), while methods using spatial autocorrelation alone (e.g., thin plate splines) performed better for variables with heterogeneous spatial structure, such as ethnicity and malaria prevention indicators. Increasing the complexity of the models did not always improve predictive performance, e.g., thin plate splines sometimes outperformed random forest or Bayesian geostatistical models. Beyond performance, we compared the different methods using other criteria (e.g., the ability to constrain the prediction range or to quantify prediction uncertainty) and discussed their implications for selecting a modelling approach tailored to the needs of the end user.
准确绘制和分解关键健康及人口风险因素对于疾病监测愈发重要,疾病监测能够揭示地理社会不平等状况,以改进健康干预措施并监测相关可持续发展目标(SDG)的进展。像人口与健康调查这样的家庭调查已被广泛用作绘制与SDG相关的家庭特征的替代方法。然而,对于所使用的工作流程尚无共识,并且已实施了不同方法,其复杂程度各异。本研究旨在比较多种建模框架,以对塞内加尔人类疟疾易感性指标(SDG目标3.3)进行建模。这些指标被分为社会经济指标(例如,发育迟缓患病率、财富指数)和疟疾预防指标(例如,室内滞留喷洒、经杀虫剂处理蚊帐的拥有情况)。我们比较了三类常用方法:(1)空间插值方法(即反距离加权法、薄板样条法、克里金法),(2)集成方法(即随机森林法),以及(3)贝叶斯地理统计模型。大多数指标能够以中到高预测精度进行建模,R²值范围为0.40至0.86。没有一种方法或方法类别脱颖而出成为最佳方法,但性能差异很大。总体而言,基于协变量的模型(例如,随机森林和贝叶斯模型)通常能更好地预测社会经济指标,而仅使用空间自相关的方法(例如,薄板样条法)对于具有异质空间结构的变量(如种族和疟疾预防指标)表现更好。增加模型的复杂性并不总是能提高预测性能,例如,薄板样条法有时优于随机森林法或贝叶斯地理统计模型。除了性能之外,我们还使用其他标准(例如,限制预测范围或量化预测不确定性的能力)比较了不同方法,并讨论了它们对于选择适合最终用户需求的建模方法的影响。