Pezanowski Scott, Koua Etien Luc, Okeibunor Joseph C, Gueye Abdou Salam
BrightWorld Labs, State College, PA, USA.
Emergency Preparedness and Response, WHO Regional Office for Africa, Brazzaville, Congo.
Digit Health. 2024 Nov 5;10:20552076241278939. doi: 10.1177/20552076241278939. eCollection 2024 Jan-Dec.
OBJECTIVES: Our research adopts computational techniques to analyze disease outbreaks weekly over a large geographic area while maintaining local-level analysis by incorporating relevant high-spatial resolution cultural and environmental datasets. The abundance of data about disease outbreaks gives scientists an excellent opportunity to uncover patterns in disease spread and make future predictions. However, data over a sizeable geographic area quickly outpace human cognition. Our study area covers a significant portion of the African continent (about 17,885,000 km). The data size makes computational analysis vital to assist human decision-makers. METHODS: We first applied global and local spatial autocorrelation for malaria, cholera, meningitis, and yellow fever case counts. We then used machine learning to predict the weekly presence of these diseases in the second-level administrative district. Lastly, we used machine learning feature importance methods on the variables that affect spread. RESULTS: Our spatial autocorrelation results show that geographic nearness is critical but varies in effect and space. Moreover, we identified many interesting hot and cold spots and spatial outliers. The machine learning model infers a binary class of cases or none with the best 1 score of 0.96 for malaria. Machine learning feature importance uncovered critical cultural and environmental factors affecting outbreaks and variations between diseases. CONCLUSIONS: Our study shows that data analytics and machine learning are vital to understanding and monitoring disease outbreaks locally across vast areas. The speed at which these methods produce insights can be critical during epidemics and emergencies.
目标:我们的研究采用计算技术,每周对大面积地理区域内的疾病暴发进行分析,同时通过纳入相关的高空间分辨率文化和环境数据集来维持地方层面的分析。大量的疾病暴发数据为科学家提供了一个绝佳的机会,以揭示疾病传播模式并做出未来预测。然而,大面积地理区域的数据很快就超出了人类的认知能力。我们的研究区域覆盖了非洲大陆的很大一部分(约1788.5万平方公里)。数据规模使得计算分析对于协助人类决策者至关重要。 方法:我们首先对疟疾、霍乱、脑膜炎和黄热病的病例数应用了全局和局部空间自相关分析。然后,我们使用机器学习来预测这些疾病在二级行政区每周的存在情况。最后,我们对影响传播的变量使用了机器学习特征重要性方法。 结果:我们的空间自相关结果表明,地理邻近性很关键,但在影响和空间上存在差异。此外,我们识别出了许多有趣的热点、冷点和空间异常值。机器学习模型推断出病例的二元类别,即有病例或无病例,疟疾的最佳F1分数为0.96。机器学习特征重要性揭示了影响疾病暴发的关键文化和环境因素以及不同疾病之间的差异。 结论:我们的研究表明,数据分析和机器学习对于在广大区域内局部理解和监测疾病暴发至关重要。在疫情和紧急情况期间,这些方法产生见解的速度可能至关重要。