London School of Hygiene & Tropical Medicine, London, United Kingdom.
European Centre for Disease Prevention and Control (ECDC), Stockholm, Sweden.
Elife. 2023 Apr 21;12:e81916. doi: 10.7554/eLife.81916.
Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022.
We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1-4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models' predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models' forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models' past predictive performance.
Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models' forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models' forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models' forecasts of deaths (N=763 predictions from 20 models). Across a 1-4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models.
Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks.
AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).
短期传染病负担预测可以帮助了解疫情形势和辅助能力规划。根据其他领域的最佳实践和传染病流行病学的最新研究,通过将多个模型组合成一个集合,可以最大限度地提高此类预测的预测性能。在这里,我们报告了 2021 年 3 月 8 日至 2022 年 3 月 7 日期间在欧洲预测 COVID-19 病例和死亡的集合的性能。
我们使用开源工具开发了一个公共的欧洲 COVID-19 预测中心。我们邀请全球团队每周为 32 个国家未来 1-4 周的标准化来源报告的 COVID-19 病例和死亡进行预测。团队从 2021 年 3 月开始使用预测分布的标准分位数提交预测。我们每周创建一个集合预测,其中每个预测分位数都是所有个体模型预测分位数的等权重平均值(最初是平均值,然后从 2021 年 7 月 26 日开始是中位数)。我们使用相对加权区间得分(WIS)衡量每个模型的性能,将模型的预测准确性与所有其他模型进行比较。我们回顾性地探索了集合预测的替代方法,包括基于模型过去预测性能的加权平均值。
在 52 周内,我们从 48 个独特的模型中收集了预测。我们将 29 个模型的预测分数与集合模型进行了比较。我们发现,随着时间的推移,集合模型在各个国家的表现始终很强劲。在所有时间和地点,集合模型在相对 WIS 上的表现都优于 23 个独特模型中的 83%的参与模型的病例预测(共 886 次预测)和 20 个模型中的 91%的死亡预测(共 763 次预测)。在 1-4 周的时间范围内,随着预测期的延长,病例预测的集合性能下降,但在 4 周的死亡预测中保持稳定。在 32 个国家的每一次预测中,集合模型在预测病例或死亡方面都优于大多数参与模型,经常在所有组件模型中表现最好。在几种集合方法的选择中,我们发现最具影响力和最佳的选择是使用模型的中位数平均值而不是平均值,无论如何加权组件预测模型。
我们的结果支持将来自单个模型的预测组合成一个集合,以在传染病疫情期间提高流行病学目标和人群的预测性能。我们的发现还表明,中位数集合方法比基于平均值的方法产生更好的预测性能。我们的研究结果还表明,在预测期超过 2 周时,预测消费者应更加重视死亡预测而不是发病预测。
AA、BH、BL、LWa、MMa、PP、SV 由美国国立卫生研究院(NIH)授予的 1R01GM109718、美国国家科学基金会大数据奖 IIS-1633028、美国国家科学基金会资助的项目 No. OAC-1916805、美国国家科学基金会拓展计算奖 CCF-1918656、CCF-1917819、NSF RAPID CNS-2028004、NSF RAPID OAC-2027541、美国疾病控制与预防中心 75D30119C05935、谷歌的一笔赠款、弗吉尼亚大学战略投资基金奖号 SIF160、美国国防威胁降低局(DTRA)的合同号 HDTRA1-19-D-0007,以及弗吉尼亚州卫生部的相应拨款 VDH-21-501-0141、VDH-21-501-0143、VDH-21-501-0147、VDH-21-501-0145、VDH-21-501-0146、VDH-21-501-0142、VDH-21-501-0148。AF、AMa、GL 由统计推断模型用于治理传染病、FISR 2020-新冠肺炎第一阶段、FISR2020IP-00156、项目代码 PRJ-0695 资助。AM、BK、FD、FR、JK、JN、JZ、KN、MG、MR、MS、RB 由波兰科学与高等教育部通过 2021 年 3 月 12 日第 11113 号法令 28/WFSN/2021 授予的“用于治理传染病的统计推断模型”项目资助。BRe、CPe、JLAz 由德国卫生部/ISCRI 资助。BT、PG 由 PERISCOPE 欧洲 H2020 项目 101016233 资助。CP、DL、EA、MC、SA 由欧盟委员会通讯网络、内容和技术总局通过合同号 LC-01485746 以及西班牙卫生部/创新与大学和 FEDER 资助的项目 PGC2018-095456-B-I00 资助。DE、MGu 由西班牙卫生部资助。DO、GF、IMi、LC 由 Los Alamos National Laboratory(LANL)的实验室指导研究和发展计划(20200700ER)资助。DS、ELR、GG、NGR、NW、YW 由美国国立卫生研究院(NIGMS)的 R35GM119582 资助,内容仅由作者负责,并不代表 NIGMS 或美国国立卫生研究院的观点。FB、FP 由伦巴第大区的 InPresa 资助。HG、KS 由欧洲疾病预防控制中心资助。IV 由加泰罗尼亚健康局(AQuAS)通过合同 2021-021OE 资助。JDe、SMo、VP 由大学医学网络(NUM)项目 egePan(01KX2021)资助。JPB、SH、TH 由德国联邦教育和研究部(BMBF;05M18SIA)资助。KH、MSc、YKh 由德国萨克森自由州资助的 SaxoCOV 项目资助。数据、模型结果和模拟的展示也由德国 NFDI4Health 任务组 COVID-19(https://www.nfdi4health.de/task-force-covid-19-2)资助,该任务组在 DFG 项目(LO-342/17-1)框架内。LP、VE 由数学和统计建模项目(MUNI/A/1615/2020)和在线实时监测、分析和管理流行病情况平台(MUNI/11/02202001/2020)资助,VE 还得到了 RECETOX 研究基础设施(捷克共和国教育、青年和体育部:LM2018121)、CETOCOEN 卓越(CZ.02.1.01/0.0/0.0/17-043/0009632)和 RECETOX RI 项目(CZ.02.1.01/0.0/0.0/16-013/0001761)的支持。NIB 由健康保护研究单位(资助代码 NIHR200908)资助。SAb、SF 由 Wellcome Trust(210758/Z/18/Z)资助。