Suppr超能文献

用于预测新冠病毒传播和病例的机器学习与概率方法

Machine Learning and Probabilistic Approaches for Forecasting COVID-19 Transmission and Cases.

作者信息

Hossain Md Sakhawat, Goyal Ravi, Martin Natasha K, DeGruttola Victor, Ahammed Tanvir, McMahan Christopher, Rennert Lior

机构信息

Department of Public Health Sciences, Clemson University, Clemson, SC, USA.

Center for Public Health Modeling and Response, Clemson University, Clemson, SC, USA.

出版信息

medRxiv. 2025 Jun 24:2025.06.24.25330210. doi: 10.1101/2025.06.24.25330210.

Abstract

Forecasting the effective reproductive number ( ) and COVID-19 case counts are critical for guiding public health responses. We developed a machine learning and probabilistic forecasting framework to predict and daily case counts at the county level in South Carolina (SC). Our approach utilized initial estimates from EpiNow2 R package refined with spatial (covariate-adjusted) smoothing. We then generated forecasts using an ensemble of regression, Random Forest, and XGBoost models, and predicted case counts with a probabilistic Poisson model. This ensemble-based approach consistently outperformed EpiNow2 across different forecast horizons (7-day, 14-day, and 21-day). In the first forecast period (November 11, 2020 - February 02, 2021), the ensemble achieved a median percentage agreement (PA) across counties of 94.4% (IQR: 93.8% - 95.3%) for 7-day ahead forecast, compared to 87.0% (IQR: 84.4% - 89.4%) from EpiNow2. In the second period (December 11, 2022 - March 04, 2023), the ensemble attained a 93.0% median PA across counties for Rt forecast (IQR: 91.3% - 94.1%), while EpiNow2 reached 86.8% (IQR: 82.5% - 89.2%). Similar trends were observed for case forecast, with the ensemble model demonstrating improved stability and performance. Combining spatial smoothing with ensemble modeling improves epidemic forecasting by enhancing predictive performance and robustness.

摘要

预测有效繁殖数(Rt)和新冠病毒病病例数对于指导公共卫生应对措施至关重要。我们开发了一个机器学习和概率预测框架,以预测南卡罗来纳州(SC)县级的Rt和每日病例数。我们的方法利用了EpiNow2 R包的初始估计值,并通过空间(协变量调整)平滑进行了优化。然后,我们使用回归、随机森林和XGBoost模型的集成生成Rt预测,并使用概率泊松模型预测病例数。这种基于集成的方法在不同的预测期(7天、14天和21天)内始终优于EpiNow2。在第一个预测期(2020年11月11日至2021年2月2日),该集成方法在7天前的Rt预测中,各县的中位数百分比一致性(PA)达到了94.4%(四分位距:93.8% - 95.3%),而EpiNow2为87.0%(四分位距:84.4% - 89.4%)。在第二个时期(2022年12月11日至2023年3月4日),该集成方法在Rt预测中各县的中位数PA达到了93.0%(四分位距:91.3% - 94.1%),而EpiNow2为86.8%(四分位距:82.5% - 89.2%)。病例预测也观察到了类似的趋势,集成模型表现出更好的稳定性和性能。将空间平滑与集成建模相结合,通过提高预测性能和稳健性改进了疫情预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b082/12262794/aeb6bf933173/nihpp-2025.06.24.25330210v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验