Toulis Panos
University of Chicago, Booth School of Business, United States of America.
J Econom. 2021 Jan;220(1):193-213. doi: 10.1016/j.jeconom.2020.10.005. Epub 2020 Oct 20.
We propose a partial identification method for estimating disease prevalence from serology studies. Our data are results from antibody tests in some population sample, where the test parameters, such as the true/false positive rates, are unknown. Our method scans the entire parameter space, and rejects parameter values using the joint data density as the test statistic. The proposed method is conservative for marginal inference, in general, but its key advantage over more standard approaches is that it is valid in finite samples even when the underlying model is not point identified. Moreover, our method requires only independence of serology test results, and does not rely on asymptotic arguments, normality assumptions, or other approximations. We use recent Covid-19 serology studies in the US, and show that the parameter confidence set is generally wide, and cannot support definite conclusions. Specifically, recent serology studies from California suggest a prevalence anywhere in the range 0%-2% (at the time of study), and are therefore inconclusive. However, this range could be narrowed down to 0.7%-1.5% if the actual false positive rate of the antibody test was indeed near its empirical estimate ( 0.5%). In another study from New York state, Covid-19 prevalence is confidently estimated in the range 13%-17% in mid-April of 2020, which also suggests significant geographic variation in Covid-19 exposure across the US. Combining all datasets yields a 5%-8% prevalence range. Our results overall suggest that serology testing on a massive scale can give crucial information for future policy design, even when such tests are imperfect and their parameters unknown.
我们提出了一种从血清学研究中估计疾病患病率的部分识别方法。我们的数据来自某些人群样本中的抗体检测结果,其中检测参数,如真/假阳性率,是未知的。我们的方法扫描整个参数空间,并使用联合数据密度作为检验统计量来拒绝参数值。一般来说,所提出的方法在边际推断方面是保守的,但其相对于更标准方法的关键优势在于,即使基础模型不是点识别的,它在有限样本中也是有效的。此外,我们的方法仅要求血清学检测结果相互独立,并且不依赖于渐近论证、正态性假设或其他近似方法。我们使用了美国最近的新冠病毒血清学研究,并表明参数置信集通常很宽,无法支持明确的结论。具体而言,加利福尼亚州最近的血清学研究表明,患病率在0% - 2%的范围内(在研究时),因此尚无定论。然而,如果抗体检测的实际假阳性率确实接近其经验估计值(0.5%),则该范围可以缩小到0.7% - 1.5%。在纽约州的另一项研究中,2020年4月中旬新冠病毒的患病率被可靠地估计在13% - 17%的范围内,这也表明美国各地新冠病毒暴露存在显著的地理差异。综合所有数据集得出的患病率范围为5% - 8%。我们的结果总体表明,大规模的血清学检测即使在检测不完美且参数未知的情况下,也能为未来的政策设计提供关键信息。