Price Bradley S, Khodaverdi Maryam, Halasz Adam, Hendricks Brian, Kimble Wesley, Smith Gordon S, Hodder Sally L
Management Information Systems Department, West Virginia University, Morgantown, West Virginia.
West Virginia Clinical and Translational Science Institute, Morgantown, West Virginia.
medRxiv. 2021 Oct 7:2021.10.06.21264569. doi: 10.1101/2021.10.06.21264569.
During the COVID-19 pandemic, West Virginia developed an aggressive SARS-CoV-2 testing strategy which included utilizing pop-up mobile testing in locations anticipated to have near-term increases in SARS-CXoV-2 infections. In this study, we describe and compare two methods for predicting near-term SARS-CoV-2 incidence in West Virginia counties. The first method, R Only, is solely based on producing forecasts for each county using the daily instantaneous reproductive numbers, R The second method, ML+ R , is a machine learning approach that uses a Long Short-Term Memory network to predict the near-term number of cases for each county using epidemiological statistics such as Rt, county population information, and time series trends including information on major holidays, as well as leveraging statewide COVID-19 trends across counties and county population size. Both approaches used daily county-level SARS-CoV-2 incidence data provided by the West Virginia Department Health and Human Resources beginning April 2020. The methods are compared on the accuracy of near-term SARS-CoV-2 increases predictions by county over 17 weeks from January 1, 2021-April 30, 2021. Both methods performed well (correlation between forecasted number of cases and the actual number of cases week over week is 0.872 for the ML+R method and 0.867 for the R Only method) but differ in performance at various time points. Over the 17-week assessment period, the ML+R method outperforms the R Only method in identifying larger spikes. We also find that both methods perform adequately in both rural and non-rural predictions. Finally, we provide a detailed discussion on practical issues regarding implementing forecasting models for public health action based on R , and the potential for further development of machine learning methods that are enhanced by R
在新冠疫情期间,西弗吉尼亚州制定了一项积极的新冠病毒检测策略,其中包括在预计短期内新冠病毒感染人数会增加的地点采用临时移动检测。在本研究中,我们描述并比较了两种预测西弗吉尼亚州县近期新冠病毒发病率的方法。第一种方法,仅R法,完全基于使用每日即时繁殖数R为每个县进行预测。第二种方法,机器学习+R法,是一种机器学习方法,它使用长短期记忆网络,利用诸如Rt、县人口信息以及包括重大节假日信息在内的时间序列趋势等流行病学统计数据,同时借助各县的全州新冠疫情趋势和县人口规模,来预测每个县近期的病例数。两种方法都使用了西弗吉尼亚州卫生和人力资源部从2020年4月开始提供的每日县级新冠病毒发病率数据。对这两种方法在2021年1月1日至2021年4月30日的17周内各县近期新冠病毒感染人数增加预测的准确性进行了比较。两种方法表现都不错(机器学习+R法预测病例数与实际病例数的周周相关性为0.872,仅R法为0.867),但在不同时间点的表现有所不同。在17周的评估期内,机器学习+R法在识别较大峰值方面优于仅R法。我们还发现,两种方法在农村和非农村地区的预测中表现都足够好。最后,我们详细讨论了基于R实施公共卫生行动预测模型的实际问题,以及通过R增强的机器学习方法的进一步发展潜力。