Wu Jiande, Tanim Shakhawat, Woo MinJae, Ahammed Tanvir, Rennert Lior
Department of Public Health Sciences, Clemson University, Clemson, SC, USA.
Center for Public Health Modeling and Response, Clemson University, Clemson, SC, USA.
medRxiv. 2025 Jun 24:2025.06.24.25330211. doi: 10.1101/2025.06.24.25330211.
The Covid-19 pandemic has highlighted the urgent need for accurate prediction of pandemic trends. We propose a deep learning model for predicting Covid-19 cases and deaths at the county level through transformer neural networks with multi-source data fusion, incorporating historical case data, death data, and social media sentiment analysis to capture both temporal (historical trends) and spatial (geographical relationships) dynamics within time series data. Additionally, we develop multi-level and multi-scale attention mechanisms for adaptive time-frequency analysis. Across three Omicron variant waves (December 2021 through February 2023), the model demonstrated strong performance in predicting county-level Covid-19 cases and deaths, with median county agreement accuracy ranging from 74.0% to 82.6% for one-week case forecasts and 68.7% to 79.6% for 5-week case forecasts. Median county agreement accuracy for deaths ranged from 83.2% to 86.3% for one-week forecasts and 84.3% to 87.2% for five-week forecasts. Incorporating social media data yielded mild to moderate improvement in forecasting accuracy. Overall, the proposed model yielded substantial improvements compared to a baseline persistence model utilizing the last observation carried forward. By integrating real-time data and capturing complex pandemic dynamics, this approach surpasses traditional methods. Its high accuracy and generalizability make it a valuable tool for enhancing public health preparedness and response strategies in future outbreaks.
新冠疫情凸显了准确预测疫情趋势的迫切需求。我们提出了一种深度学习模型,通过具有多源数据融合的变压器神经网络来预测县级新冠病例和死亡情况,该模型纳入了历史病例数据、死亡数据以及社交媒体情绪分析,以捕捉时间序列数据中的时间(历史趋势)和空间(地理关系)动态。此外,我们还开发了用于自适应时频分析的多层次和多尺度注意力机制。在三个奥密克戎变异株浪潮期间(2021年12月至2023年2月),该模型在预测县级新冠病例和死亡情况方面表现出色,对于一周病例预测,县级中位数一致准确率在74.0%至82.6%之间,对于五周病例预测,准确率在68.7%至79.6%之间。对于死亡情况,一周预测的县级中位数一致准确率在83.2%至86.3%之间,五周预测的准确率在84.3%至87.2%之间。纳入社交媒体数据使预测准确率有轻度到中度的提高。总体而言,与采用前推最后一次观测值的基线持续性模型相比,所提出的模型有显著改进。通过整合实时数据并捕捉复杂的疫情动态,这种方法超越了传统方法。其高准确率和通用性使其成为加强未来疫情中公共卫生准备和应对策略的宝贵工具。