ARC Centre for Data Analytics for Resources and Environments, Sydney, Australia.
School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.
Eur J Epidemiol. 2020 Aug;35(8):733-742. doi: 10.1007/s10654-020-00669-6. Epub 2020 Aug 11.
Forecasting models have been influential in shaping decision-making in the COVID-19 pandemic. However, there is concern that their predictions may have been misleading. Here, we dissect the predictions made by four models for the daily COVID-19 death counts between March 25 and June 5 in New York state, as well as the predictions of ICU bed utilisation made by the influential IHME model. We evaluated the accuracy of the point estimates and the accuracy of the uncertainty estimates of the model predictions. First, we compared the "ground truth" data sources on daily deaths against which these models were trained. Three different data sources were used by these models, and these had substantial differences in recorded daily death counts. Two additional data sources that we examined also provided different death counts per day. For accuracy of prediction, all models fared very poorly. Only 10.2% of the predictions fell within 10% of their training ground truth, irrespective of distance into the future. For accurate assessment of uncertainty, only one model matched relatively well the nominal 95% coverage, but that model did not start predictions until April 16, thus had no impact on early, major decisions. For ICU bed utilisation, the IHME model was highly inaccurate; the point estimates only started to match ground truth after the pandemic wave had started to wane. We conclude that trustworthy models require trustworthy input data to be trained upon. Moreover, models need to be subjected to prespecified real time performance tests, before their results are provided to policy makers and public health officials.
预测模型在塑造 COVID-19 大流行中的决策方面发挥了重要作用。然而,人们担心这些预测可能存在误导。在这里,我们剖析了四个模型在 3 月 25 日至 6 月 5 日期间对纽约州每日 COVID-19 死亡人数的预测,以及 IHME 模型对 ICU 床位使用情况的预测。我们评估了模型预测的点估计准确性和不确定性估计准确性。首先,我们将这些模型所依据的每日死亡的“真实数据”来源与实际数据进行了比较。这些模型使用了三种不同的数据来源,这些数据来源在记录的每日死亡人数方面存在很大差异。我们检查的另外两个数据来源也提供了不同的每日死亡人数。对于预测的准确性,所有模型的表现都非常差。无论距离未来的时间远近,只有 10.2%的预测落在其训练真实数据的 10%以内。对于准确评估不确定性,只有一个模型与名义上的 95%覆盖率相对匹配,但该模型直到 4 月 16 日才开始预测,因此对早期的重大决策没有影响。对于 ICU 床位使用情况,IHME 模型的预测非常不准确;点估计只有在大流行浪潮开始减弱后才开始与真实数据匹配。我们的结论是,可靠的模型需要基于可靠的输入数据进行训练。此外,在向政策制定者和公共卫生官员提供模型结果之前,需要对模型进行预定的实时性能测试。