Department of Statistical Sciences, University of Padova, Italy.
Department of Statistical Sciences, Catholic University of the Sacred Hearth, Milano, Italy.
Sci Total Environ. 2021 Apr 10;764:142799. doi: 10.1016/j.scitotenv.2020.142799. Epub 2020 Oct 8.
During the Covid-19 pandemic in Italy, official data are collected with medical swabs following a pure convenience criterion which, at least in an early phase, has privileged the exam of patients showing evident symptoms. However, there are evidences of a very high proportion of asymptomatic patients. In this situation, in order to estimate the real number of infected (and to estimate the lethality rate), it should be necessary to run a properly designed sample survey through which it would be possible to calculate the probability of inclusion and hence draw sound probabilistic inference. Unfortunately, the survey run by the Italian Statistical Institute encountered many field difficulties. Some researchers proposed estimates of the total prevalence based on various approaches, including epidemiologic models, time series and the analysis of data collected in countries that faced the epidemic in earlier times. In this paper, we propose to estimate the prevalence of Covid-19 in Italy by reweighting the available official data published by the Istituto Superiore di Sanità so as to obtain a more representative sample of the Italian population. Reweighting is a procedure commonly used to artificially modify the sample composition so as to obtain a distribution which is more similar to the population. In this paper, we will use post-stratification of the official data, in order to derive the weights necessary for reweighting the sample results, using age and gender as post-stratification variables, thus obtaining more reliable estimation of prevalence and lethality. Specifically, for Italy, we obtain a prevalence of 9%. The proposed methodology represents a reasonable approximation while waiting for more reliable data obtained with a properly designed national sample survey and that it could be further improved if more data were made available.
在意大利的新冠疫情期间,官方数据是通过医疗拭子收集的,采用的是纯粹的便利性标准,这种标准至少在早期阶段优先考虑了表现出明显症状的患者的检查。然而,有证据表明无症状患者的比例非常高。在这种情况下,为了估计实际感染人数(并估计病死率),有必要通过适当设计的抽样调查来进行,通过这种调查可以计算纳入的概率,从而得出合理的概率推断。不幸的是,意大利统计研究所进行的调查遇到了许多现场困难。一些研究人员提出了基于各种方法的总流行率估计,包括流行病学模型、时间序列以及对更早时期面临疫情的国家收集的数据的分析。在本文中,我们建议通过重新加权意大利卫生保健系统发布的可用官方数据来估计意大利的新冠流行率,以获得更具代表性的意大利人口样本。重新加权是一种常用的程序,用于人为地修改样本组成,以获得更接近总体的分布。在本文中,我们将使用官方数据的后分层,以便根据年龄和性别等后分层变量,推导出重新加权样本结果所需的权重,从而更可靠地估计流行率和病死率。具体来说,对于意大利,我们得到的流行率为 9%。在等待更可靠的全国抽样调查数据的同时,该方法代表了一种合理的近似,并且如果提供更多数据,它可以进一步改进。