Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States of America.
Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States of America.
PLoS One. 2021 Jan 7;16(1):e0244173. doi: 10.1371/journal.pone.0244173. eCollection 2021.
The novel coronavirus (COVID-19) is an emergent disease that initially had no historical data to guide scientists on predicting/ forecasting its global or national impact over time. The ability to predict the progress of this pandemic has been crucial for decision making aimed at fighting this pandemic and controlling its spread. In this work we considered four different statistical/time series models that are readily available from the 'forecast' package in R. We performed novel applications with these models, forecasting the number of infected cases (confirmed cases and similarly the number of deaths and recovery) along with the corresponding 90% prediction interval to estimate uncertainty around pointwise forecasts. Since the future may not repeat the past for this pandemic, no prediction model is certain. However, any prediction tool with acceptable prediction performance (or prediction error) could still be very useful for public-health planning to handle spread of the pandemic, and could policy decision-making and facilitate transition to normality. These four models were applied to publicly available data of the COVID-19 pandemic for both the USA and Italy. We observed that all models reasonably predicted the future numbers of confirmed cases, deaths, and recoveries of COVID-19. However, for the majority of the analyses, the time series model with autoregressive integrated moving average (ARIMA) and cubic smoothing spline models both had smaller prediction errors and narrower prediction intervals, compared to the Holt and Trigonometric Exponential smoothing state space model with Box-Cox transformation (TBATS) models. Therefore, the former two models were preferable to the latter models. Given similarities in performance of the models in the USA and Italy, the corresponding prediction tools can be applied to other countries grappling with the COVID-19 pandemic, and to any pandemics that can occur in future.
新型冠状病毒(COVID-19)是一种新发疾病,最初没有历史数据可供科学家预测其在全球或国家范围内随时间的影响。预测这种大流行的进展对于旨在抗击这种大流行并控制其传播的决策至关重要。在这项工作中,我们考虑了 R 语言中的 'forecast' 包中提供的四种不同的统计/时间序列模型。我们对这些模型进行了新颖的应用,预测了感染人数(确诊病例,同样死亡人数和康复人数)以及相应的 90%预测区间,以估计点预测的不确定性。由于这种大流行的未来可能不会重复过去,因此没有预测模型是确定的。但是,任何具有可接受预测性能(或预测误差)的预测工具对于公共卫生规划处理大流行的传播仍然非常有用,并且可以为政策决策提供信息并促进向常态过渡。这四个模型都应用于美国和意大利的 COVID-19 大流行的公开可用数据。我们观察到,所有模型都合理地预测了 COVID-19 的未来确诊病例、死亡人数和康复人数。但是,对于大多数分析,与带 Box-Cox 变换的 Holt 和 Trigonometric Exponential 平滑状态空间模型(TBATS)模型相比,自回归综合移动平均(ARIMA)和三次平滑样条模型的时间序列模型具有更小的预测误差和更窄的预测区间。因此,前两个模型优于后两个模型。鉴于这些模型在美国和意大利的性能相似,相应的预测工具可以应用于其他国家应对 COVID-19 大流行,以及未来可能发生的任何大流行。