Department of Biology, University of Oxford, United Kingdom.
Department of Biology, University of Oxford, United Kingdom; Department of Computer Science, University of Oxford, United Kingdom.
Epidemics. 2022 Dec;41:100627. doi: 10.1016/j.epidem.2022.100627. Epub 2022 Sep 5.
SARS-CoV-2 case data are primary sources for estimating epidemiological parameters and for modelling the dynamics of outbreaks. Understanding biases within case-based data sources used in epidemiological analyses is important as they can detract from the value of these rich datasets. This raises questions of how variations in surveillance can affect the estimation of epidemiological parameters such as the case growth rates. We use standardised line list data of COVID-19 from Argentina, Brazil, Mexico and Colombia to estimate delay distributions of symptom-onset-to-confirmation, -hospitalisation and -death as well as hospitalisation-to-death at high spatial resolutions and throughout time. Using these estimates, we model the biases introduced by the delay from symptom-onset-to-confirmation on national and state level case growth rates (rt) using an adaptation of the Richardson-Lucy deconvolution algorithm. We find significant heterogeneities in the estimation of delay distributions through time and space with delay difference of up to 19 days between epochs at the state level. Further, we find that by changing the spatial scale, estimates of case growth rate can vary by up to 0.13 d. Lastly, we find that states with a high variance and/or mean delay in symptom-onset-to-diagnosis also have the largest difference between the rt estimated from raw and deconvolved case counts at the state level. We highlight the importance of high-resolution case-based data in understanding biases in disease reporting and how these biases can be avoided by adjusting case numbers based on empirical delay distributions. Code and openly accessible data to reproduce analyses presented here are available.
SARS-CoV-2 病例数据是估计流行病学参数和模拟疫情动态的主要来源。了解流行病学分析中使用的基于病例的数据源中的偏差非常重要,因为它们会降低这些丰富数据集的价值。这就提出了一个问题,即监测的变化如何影响流行病学参数的估计,例如病例增长率。我们使用来自阿根廷、巴西、墨西哥和哥伦比亚的标准化 COVID-19 病例清单数据,以高空间分辨率和整个时间范围内估计症状出现到确诊、住院和死亡的延迟分布,以及住院到死亡的延迟分布。使用这些估计值,我们使用理查德森-露西反卷积算法的改编版,模拟了从症状出现到确诊的延迟对国家和州级病例增长率(rt)的引入偏差。我们发现,随着时间和空间的推移,延迟分布的估计存在显著的异质性,在州级水平上,不同时期的延迟差异高达 19 天。此外,我们发现,通过改变空间尺度,病例增长率的估计值可能会相差 0.13 天。最后,我们发现,在症状出现到诊断的延迟方面具有高方差和/或均值的州,其 rt 的估计值与州级原始和反卷积病例数之间的差异也最大。我们强调了基于病例的高分辨率数据在理解疾病报告偏差方面的重要性,以及如何通过根据经验延迟分布调整病例数量来避免这些偏差。重现这里呈现的分析的代码和公开可访问的数据是可用的。