Lynch Christopher J, Gore Ross
Old Dominion University - Virginia Modeling, Analysis, and Simulation Center (VMASC), United States.
Data Brief. 2021 Apr;35:106759. doi: 10.1016/j.dib.2021.106759. Epub 2021 Jan 15.
The coronavirus disease 2019 (COVID-19) spread rapidly across the world since its appearance in December 2019. This data set creates one-, three-, and seven-day forecasts of the COVID-19 pandemic's cumulative case counts at the county, health district, and state geographic levels for the state of Virginia. Forecasts are created over the first 46 days of reported COVID-19 cases using the cumulative case count data provided by as of April 22, 2020. From this historical data, one-, three-, seven, and all-days prior to the forecast start date are used to generate the forecasts. Forecasts are created using: (1) a Naïve approach; (2) Holt-Winters exponential smoothing (HW); (3) growth rate (Growth); (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). Median Absolute Error (MdAE) and Median Absolute Percentage Error (MdAPE) metrics are created with each forecast to evaluate the forecast with respect to existing historical data. These error metrics are aggregated to provide a means for assessing which combination of forecast method, forecast length, and lookback length are best fits, based on lowest aggregated error at each geographic level. The data set is comprised of an R-Project file, four R source code files, all 1,329,404 generated short-range forecasts, MdAE and MdAPE error metric data for each forecast, copies of the input files, and the generated comparison tables. All code and data files are provided to provide transparency and facilitate replicability and reproducibility. This package opens directly in RStudio through the R Project file. The R Project file removes the need to set path locations for the folders contained within the data set to simplify setup requirements. This data set provides two avenues for reproducing results: 1) Use the provided code to generate the forecasts from scratch and then run the analyses; or 2) Load the saved forecast data and run the analyses on the stored data. Code annotations provide the instructions needed to accomplish both routes. This data can be used to generate the same set of forecasts and error metrics for any US state by altering the state parameter within the source code. Users can also generate health district forecasts for any other state, by providing a file which maps each county within a state to its respective health-district. The source code can be connected to the most up-to-date version of COVID-19 dataset allows for the generation of forecasts up to the most recently reported data to facilitate near real-time forecasting.
2019年冠状病毒病(COVID-19)自2019年12月出现以来在全球迅速传播。此数据集针对弗吉尼亚州的县、卫生区和州地理级别创建了COVID-19大流行累计病例数的1天、3天和7天预测。使用截至2020年4月22日 提供的累计病例数数据,在报告的COVID-19病例的前46天内创建预测。根据这些历史数据,使用预测开始日期前的1天、3天、7天以及所有天数的数据来生成预测。预测使用以下方法创建:(1)朴素方法;(2)霍尔特-温特斯指数平滑法(HW);(3)增长率(Growth);(4)移动平均法(MA);(5)自回归法(AR);(6)自回归移动平均法(ARMA);以及(7)自回归积分移动平均法(ARIMA)。为每个预测创建中位数绝对误差(MdAE)和中位数绝对百分比误差(MdAPE)指标,以根据现有历史数据评估预测。汇总这些误差指标,以便根据每个地理级别上的最低汇总误差,提供一种评估预测方法、预测长度和回溯长度的哪种组合最合适的方法。该数据集由一个R项目文件、四个R源代码文件、所有生成的1,329,404个短期预测、每个预测的MdAE和MdAPE误差指标数据、输入文件副本以及生成的比较表组成。提供所有代码和数据文件以提高透明度,并便于复制和再现。此软件包可通过R项目文件直接在RStudio中打开。R项目文件无需设置数据集中包含的文件夹的路径位置,从而简化了设置要求。此数据集提供了两种重现结果的途径:1)使用提供的代码从头生成预测,然后运行分析;或2)加载保存的预测数据并对存储的数据运行分析。代码注释提供了完成这两种途径所需的说明。通过更改源代码中的州参数,此数据可用于为任何美国州生成相同的预测集和误差指标。用户还可以通过提供一个将一个州内的每个县映射到其各自卫生区的文件,为任何其他州生成卫生区预测。源代码可以连接到最新版本的 COVID-19数据集,从而能够生成直至最近报告数据的预测,以促进近实时预测。