Roster Kirstin, Connaughton Colm, Rodrigues Francisco A
Institute of Mathematics and Computer Science, University of São Paulo, Avenida Trabalhador São Carlense 400, São Carlos 13566-590, São Paulo, Brazil.
Mathematics Institute, University of Warwick, Coventry CV4 7AL, United Kingdom.
Chaos Solitons Fractals. 2022 Aug;161:112306. doi: 10.1016/j.chaos.2022.112306. Epub 2022 Jun 23.
Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiological models. Time series forecasts and machine learning, while less reliant on assumptions about the disease, require large amounts of data that are also not available in early stages of an outbreak. In this study, we examine how knowledge of related diseases can help make predictions of new diseases in data-scarce environments using transfer learning. We implement both an empirical and a synthetic approach. Using data from Brazil, we compare how well different machine learning models transfer knowledge between two different dataset pairs: case counts of (i) dengue and Zika, and (ii) influenza and COVID-19. In the synthetic analysis, we generate data with an SIR model using different transmission and recovery rates, and then compare the effectiveness of different transfer learning methods. We find that transfer learning offers the potential to improve predictions, even beyond a model based on data from the target disease, though the appropriate source disease must be chosen carefully. While imperfect, these models offer an additional input for decision makers for pandemic response.
近期的传染病爆发,如新冠疫情和巴西的寨卡疫情,已表明准确预测新型传染病的重要性和难度。当新疾病首次出现时,我们对传播过程、再次感染的免疫水平和持续时间,或构建现实流行病学模型所需的其他参数知之甚少。时间序列预测和机器学习虽然较少依赖于对疾病的假设,但需要大量数据,而在疫情爆发的早期阶段这些数据也是无法获得的。在本研究中,我们探讨了相关疾病的知识如何通过迁移学习帮助在数据稀缺的环境中对新疾病进行预测。我们实施了实证方法和综合方法。利用来自巴西的数据,我们比较了不同机器学习模型在两组不同数据集之间迁移知识的效果:(i)登革热和寨卡的病例数,以及(ii)流感和新冠的病例数。在综合分析中,我们使用不同的传播和恢复率通过SIR模型生成数据,然后比较不同迁移学习方法的有效性。我们发现,迁移学习有改善预测的潜力,甚至超过基于目标疾病数据的模型,不过必须谨慎选择合适的源疾病。虽然这些模型并不完美,但它们为决策者应对大流行提供了额外的参考依据。