Laboratory of Integrative Systems Physiology, Ecole Polytechnique Fédérale de Lausanne, EPFL/IBI/LISP Lausanne, Switzerland.
ENS Paris-Saclay, CNRS, Centre Borelli, Université Paris-Saclay, Gif-sur-Yvette, France.
PLoS Comput Biol. 2023 Jun 21;19(6):e1010790. doi: 10.1371/journal.pcbi.1010790. eCollection 2023 Jun.
The COVID-19 pandemy has created a radically new situation where most countries provide raw measurements of their daily incidence and disclose them in real time. This enables new machine learning forecast strategies where the prediction might no longer be based just on the past values of the current incidence curve, but could take advantage of observations in many countries. We present such a simple global machine learning procedure using all past daily incidence trend curves. Each of the 27,418 COVID-19 incidence trend curves in our database contains the values of 56 consecutive days extracted from observed incidence curves across 61 world regions and countries. Given a current incidence trend curve observed over the past four weeks, its forecast in the next four weeks is computed by matching it with the first four weeks of all samples, and ranking them by their similarity to the query curve. Then the 28 days forecast is obtained by a statistical estimation combining the values of the 28 last observed days in those similar samples. Using comparison performed by the European Covid-19 Forecast Hub with the current state of the art forecast methods, we verify that the proposed global learning method, EpiLearn, compares favorably to methods forecasting from a single past curve.
新冠疫情大流行创造了一个全新的局面,大多数国家提供其每日发病率的原始测量值,并实时公布。这使得新的机器学习预测策略成为可能,预测不再仅仅基于当前发病率曲线的过去值,而是可以利用许多国家的观察结果。我们使用所有过去的每日发病率趋势曲线呈现了这样一个简单的全球机器学习程序。我们数据库中的 27418 条 COVID-19 发病率趋势曲线中的每一条都包含从 61 个世界地区和国家的观察发病率曲线中提取的 56 个连续日的值。对于过去四周观察到的当前发病率趋势曲线,通过将其与所有样本的前四周进行匹配,并根据与查询曲线的相似性对其进行排名,计算出未来四周的预测值。然后通过在这些相似样本中对过去 28 天最后观察到的数值进行统计估计,得到 28 天的预测值。通过与欧洲新冠疫情预测中心进行的比较,使用当前最先进的预测方法,我们验证了所提出的全球学习方法 EpiLearn 与从单个过去曲线进行预测的方法相比具有优势。