Suppr超能文献

利用环境、大气和流动性数据对新冠病毒传播数进行早期预测:一种监督式机器学习方法。

Early prediction of SARS-CoV-2 reproductive number from environmental, atmospheric and mobility data: A supervised machine learning approach.

作者信息

Caruso Pier Francesco, Angelotti Giovanni, Greco Massimiliano, Guzzetta Giorgio, Cereda Danilo, Merler Stefano, Cecconi Maurizio

机构信息

Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072 Pieve Emanuele - Milan, Italy; Department of Anesthesiology and Intensive Care, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano - Milan, Italy.

Aritifcial Intelligence Center, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089 Rozzano - Milan, Italy.

出版信息

Int J Med Inform. 2022 Apr 1;162:104755. doi: 10.1016/j.ijmedinf.2022.104755.

Abstract

INTRODUCTION

SARS-CoV-2 was declared a pandemic by the WHO on March 11th, 2020. Public protective measures were enforced in every country to limit the diffusion of SARS-CoV-2. Its transmission, mainly by droplets, has been measured by the effective reproduction number (Rt) that counts the number of secondary cases caused in a population by an average infectious individual at time t. Current strategies to calculate Rt reflect the number of secondary cases after several days, due to a delay from symptoms onset to reporting. We propose a complementary Rt estimation using supervised machine learning techniques to predict short term variations with more timely results.

MATERIAL AND METHODS

Our primary goal was to predict Rt of the current day in the twelve provinces of Lombardy with the highest possible accuracy, and with no influence of the local testing strategies. We gathered data about mobility, weather, and pollution from different public sources as a proxy of human behavior and public health measures. We built four supervised machine learning algorithms with different strategies: the outcome variable was the daily median Rt values per province obtained from officially adopted algorithms.

RESULTS

Data from 243 days for every province were presented to our four models (from February 15th, 2020, to October 14th, 2020). Two models using differential calculation of Rt instead of the raw values showed the highest mean coefficient of determination (0.93 for both) and residuals reported the lowest mean error (-0.03 and 0.01) and standard deviation (0.13 for both) as well. The one with access to the value of Rt of the day before heavily relied on that feature for prediction, while the other one had more distributed weights.

DISCUSSION

The model that had not access to the Rt value of the previous day and used Rt differential value as outcome (FDRt) was considered the most robust according to the metrics. Its forecasts were able to predict the trend that Rt values would have developed over different weeks, but it was not particularly accurate in predicting the precise value of Rt. A correlation among mobility, atmospheric, features, pollution and Rt values is plausible, but further testing should be performed.

摘要

引言

2020年3月11日,世界卫生组织宣布新型冠状病毒肺炎(SARS-CoV-2)为大流行病。各国都实施了公共防护措施以限制SARS-CoV-2的传播。其传播主要通过飞沫,已通过有效再生数(Rt)来衡量,该数值计算的是在时间t时,一个平均具有传染性的个体在人群中引起的二代病例数。由于从症状出现到报告存在延迟,当前计算Rt的策略反映的是数天后的二代病例数。我们提出使用监督机器学习技术进行补充性Rt估计,以更及时地预测短期变化。

材料与方法

我们的主要目标是尽可能准确地预测伦巴第十二个省份当日的Rt,且不受当地检测策略的影响。我们从不同公共来源收集了有关出行、天气和污染的数据,作为人类行为和公共卫生措施的代理。我们构建了四种采用不同策略的监督机器学习算法:结果变量是通过官方采用的算法获得的每个省份每日Rt的中位数。

结果

每个省份243天的数据(从2020年2月15日至2020年10月14日)被输入到我们的四个模型中。两个使用Rt差值计算而非原始值的模型显示出最高的平均决定系数(两者均为0.93),残差的平均误差最低(分别为-0.03和0.01),标准差也最低(两者均为0.13)。能够获取前一日Rt值的模型在很大程度上依赖该特征进行预测,而另一个模型的权重分布更为分散。

讨论

根据各项指标,无法获取前一日Rt值且使用Rt差值作为结果的模型(FDRt)被认为是最稳健的。其预测能够预测Rt值在不同周内的发展趋势,但在预测Rt的精确值方面并不特别准确。出行、大气特征、污染与Rt值之间存在相关性是合理的,但应进行进一步测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c0/8970608/82cdf8f79306/gr1_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验