Department of Statistics, Ludwig Maximilian Universität, München, Germany.
Institute for Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.
Sci Rep. 2022 Mar 10;12(1):3930. doi: 10.1038/s41598-022-07757-5.
During 2020, the infection rate of COVID-19 has been investigated by many scholars from different research fields. In this context, reliable and interpretable forecasts of disease incidents are a vital tool for policymakers to manage healthcare resources. In this context, several experts have called for the necessity to account for human mobility to explain the spread of COVID-19. Existing approaches often apply standard models of the respective research field, frequently restricting modeling possibilities. For instance, most statistical or epidemiological models cannot directly incorporate unstructured data sources, including relational data that may encode human mobility. In contrast, machine learning approaches may yield better predictions by exploiting these data structures yet lack intuitive interpretability as they are often categorized as black-box models. We propose a combination of both research directions and present a multimodal learning framework that amalgamates statistical regression and machine learning models for predicting local COVID-19 cases in Germany. Results and implications: the novel approach introduced enables the use of a richer collection of data types, including mobility flows and colocation probabilities, and yields the lowest mean squared error scores throughout the observational period in the reported benchmark study. The results corroborate that during most of the observational period more dispersed meeting patterns and a lower percentage of people staying put are associated with higher infection rates. Moreover, the analysis underpins the necessity of including mobility data and showcases the flexibility and interpretability of the proposed approach.
2020 年期间,许多来自不同研究领域的学者对 COVID-19 的感染率进行了研究。在这种情况下,对疾病事件进行可靠且可解释的预测是政策制定者管理医疗资源的重要工具。在这种情况下,一些专家呼吁有必要考虑人类流动来解释 COVID-19 的传播。现有的方法通常应用各自研究领域的标准模型,频繁地限制建模的可能性。例如,大多数统计或流行病学模型不能直接纳入非结构化数据源,包括可能编码人类流动的关系数据。相比之下,机器学习方法可以通过利用这些数据结构来获得更好的预测,但由于它们通常被归类为黑盒模型,因此缺乏直观的可解释性。我们提出了这两个研究方向的结合,并提出了一个多模态学习框架,该框架将统计回归和机器学习模型融合在一起,用于预测德国的局部 COVID-19 病例。结果和意义:所引入的新方法能够使用更丰富的数据类型集合,包括流动模式和共置概率,并在报告的基准研究中整个观测期间产生最低的均方误差得分。结果证实,在大多数观测期间,更多分散的会议模式和更高比例的留守人员与更高的感染率相关。此外,该分析支持了纳入流动数据的必要性,并展示了所提出方法的灵活性和可解释性。