一种用于在数据稀疏的地理区域进行地方层面有效繁殖数估计的灵活框架。
A flexible framework for local-level estimation of the effective reproductive number in geographic regions with sparse data.
作者信息
Hossain Md Sakhawat, Goyal Ravi, Martin Natasha K, DeGruttola Victor, Chowdhury Mohammad Mihrab, McMahan Christopher, Rennert Lior
机构信息
Department of Public Health Sciences, Clemson University, Clemson, SC, 29634, USA.
Center for Public Health Modeling and Response, Clemson University, Clemson, SC, USA.
出版信息
BMC Med Res Methodol. 2025 Mar 18;25(1):73. doi: 10.1186/s12874-025-02525-1.
BACKGROUND
Our research focuses on local-level estimation of the effective reproductive number, which describes the transmissibility of an infectious disease and represents the average number of individuals one infectious person infects at a given time. The ability to accurately estimate the infectious disease reproductive number in geographically granular regions is critical for disaster planning and resource allocation. However, not all regions have sufficient infectious disease outcome data; this lack of data presents a significant challenge for accurate estimation.
METHODS
To overcome this challenge, we propose a two-step approach that incorporates existing [Formula: see text] estimation procedures (EpiEstim, EpiFilter, EpiNow2) using data from geographic regions with sufficient data (step 1), into a covariate-adjusted Bayesian Integrated Nested Laplace Approximation (INLA) spatial model to predict [Formula: see text] in regions with sparse or missing data (step 2). Our flexible framework effectively allows us to implement any existing estimation procedure for [Formula: see text] in regions with coarse or entirely missing data. We perform external validation and a simulation study to evaluate the proposed method and assess its predictive performance.
RESULTS
We applied our method to estimate [Formula: see text]using data from South Carolina (SC) counties and ZIP codes during the first COVID-19 wave ('Wave 1', June 16, 2020 - August 31, 2020) and the second wave ('Wave 2', December 16, 2020 - March 02, 2021). Among the three methods used in the first step, EpiNow2 yielded the highest accuracy of [Formula: see text] prediction in the regions with entirely missing data. Median county-level percentage agreement (PA) was 90.9% (Interquartile Range, IQR: 89.9-92.0%) and 92.5% (IQR: 91.6-93.4%) for Wave 1 and 2, respectively. Median zip code-level PA was 95.2% (IQR: 94.4-95.7%) and 96.5% (IQR: 95.8-97.1%) for Wave 1 and 2, respectively. Using EpiEstim, EpiFilter, and an ensemble-based approach yielded median PA ranging from 81.9 to 90.0%, 87.2-92.1%, and 88.4-90.9%, respectively, across both waves and geographic granularities.
CONCLUSION
These findings demonstrate that the proposed methodology is a useful tool for small-area estimation of [Formula: see text], as our flexible framework yields high prediction accuracy for regions with coarse or missing data.
背景
我们的研究聚焦于有效繁殖数的局部层面估计,有效繁殖数描述了传染病的传播能力,代表在给定时间内一个感染者平均感染的个体数量。在地理粒度区域准确估计传染病繁殖数的能力对于灾难规划和资源分配至关重要。然而,并非所有地区都有足够的传染病结果数据;数据的缺失给准确估计带来了重大挑战。
方法
为克服这一挑战,我们提出一种两步法,该方法将使用来自数据充足地理区域的数据的现有繁殖数估计程序(EpiEstim、EpiFilter、EpiNow2)(步骤1),纳入协变量调整的贝叶斯集成嵌套拉普拉斯近似(INLA)空间模型,以预测数据稀疏或缺失区域的繁殖数(步骤2)。我们灵活的框架使我们能够有效地在数据粗略或完全缺失的区域实施任何现有的繁殖数估计程序。我们进行外部验证和模拟研究,以评估所提出的方法并评估其预测性能。
结果
我们应用我们的方法,使用南卡罗来纳州(SC)各县和邮政编码区在第一波新冠疫情(“第一波”,2020年6月16日至2020年8月31日)和第二波疫情(“第二波”,2020年12月16日至2021年3月2日)期间的数据来估计繁殖数。在第一步中使用的三种方法中,EpiNow2在数据完全缺失的区域产生了最高的繁殖数预测准确率。第一波和第二波的县级百分比一致性中位数(PA)分别为90.9%(四分位间距,IQR:89.9 - 92.0%)和92.5%(IQR:91.6 - 93.4%)。第一波和第二波的邮政编码区层面PA中位数分别为95.2%(IQR:94.4 - 95.7%)和96.5%(IQR:95.8 - 97.1%)。在两波疫情和不同地理粒度下,使用EpiEstim、EpiFilter和基于集成的方法得出的PA中位数分别在81.9%至90.0%、87.2 - 92.1%和88.4 - 90.9%之间。
结论
这些发现表明,所提出的方法是用于小区域繁殖数估计的有用工具,因为我们灵活的框架对于数据粗略或缺失的区域产生了较高的预测准确率。