Department of Environmental and Occupational Health, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, USA.
Institute of Parasitic Diseases, Sichuan Center for Disease Control and Prevention, Chengdu, China.
Int J Health Geogr. 2023 Jun 2;22(1):12. doi: 10.1186/s12942-023-00331-w.
Although the presence of intermediate snails is a necessary condition for local schistosomiasis transmission to occur, using them as surveillance targets in areas approaching elimination is challenging because the patchy and dynamic quality of snail host habitats makes collecting and testing snails labor-intensive. Meanwhile, geospatial analyses that rely on remotely sensed data are becoming popular tools for identifying environmental conditions that contribute to pathogen emergence and persistence.
In this study, we assessed whether open-source environmental data can be used to predict the presence of human Schistosoma japonicum infections among households with a similar or improved degree of accuracy compared to prediction models developed using data from comprehensive snail surveys. To do this, we used infection data collected from rural communities in Southwestern China in 2016 to develop and compare the predictive performance of two Random Forest machine learning models: one built using snail survey data, and one using open-source environmental data.
The environmental data models outperformed the snail data models in predicting household S. japonicum infection with an estimated accuracy and Cohen's kappa value of 0.89 and 0.49, respectively, in the environmental model, compared to an accuracy and kappa of 0.86 and 0.37 for the snail model. The Normalized Difference in Water Index (an indicator of surface water presence) within half to one kilometer of the home and the distance from the home to the nearest road were among the top performing predictors in our final model. Homes were more likely to have infected residents if they were further from roads, or nearer to waterways.
Our results suggest that in low-transmission environments, leveraging open-source environmental data can yield more accurate identification of pockets of human infection than using snail surveys. Furthermore, the variable importance measures from our models point to aspects of the local environment that may indicate increased risk of schistosomiasis. For example, households were more likely to have infected residents if they were further from roads or were surrounded by more surface water, highlighting areas to target in future surveillance and control efforts.
尽管中间宿主的存在是局部血吸虫病传播发生的必要条件,但在接近消除的地区将其作为监测目标使用具有挑战性,因为宿主蜗牛栖息地的斑块状和动态特性使得收集和检测蜗牛变得劳动密集型。同时,依赖遥感数据的地理空间分析正在成为识别有助于病原体出现和持续存在的环境条件的流行工具。
在这项研究中,我们评估了开源环境数据是否可以用于预测中国西南部农村社区 2016 年收集的家庭人类日本血吸虫感染情况,与使用全面蜗牛调查数据开发的预测模型相比,其具有相似或更高的准确性。为此,我们使用感染数据来开发和比较两种随机森林机器学习模型的预测性能:一种使用蜗牛调查数据构建,另一种使用开源环境数据构建。
环境数据模型在预测家庭日本血吸虫感染方面优于蜗牛数据模型,环境模型的估计准确性和 Cohen's kappa 值分别为 0.89 和 0.49,而蜗牛模型的准确性和 kappa 值分别为 0.86 和 0.37。家庭与最近道路的距离以及距离家庭半公里到一公里内的归一化差异水指数(地表水存在的指标)是我们最终模型中表现最佳的预测因子之一。如果家庭离道路更远或更靠近水道,那么家庭更有可能有受感染的居民。
我们的研究结果表明,在低传播环境中,利用开源环境数据可以比使用蜗牛调查更准确地识别人类感染的热点。此外,我们模型的变量重要性措施指出了当地环境的某些方面可能表明血吸虫病的风险增加。例如,如果家庭离道路更远或周围地表水更多,那么家庭更有可能有受感染的居民,这突出了未来监测和控制工作的重点领域。