Fintzi Jonathan, Wakefield Jon, Minin Vladimir N
Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Rockville, Maryland, USA.
Departments of Biostatistics and Statistics, University of Washington, Seattle, Washington, USA.
Biometrics. 2022 Dec;78(4):1530-1541. doi: 10.1111/biom.13538. Epub 2021 Sep 7.
Stochastic epidemic models (SEMs) fit to incidence data are critical to elucidating outbreak dynamics, shaping response strategies, and preparing for future epidemics. SEMs typically represent counts of individuals in discrete infection states using Markov jump processes (MJPs), but are computationally challenging as imperfect surveillance, lack of subject-level information, and temporal coarseness of the data obscure the true epidemic. Analytic integration over the latent epidemic process is impossible, and integration via Markov chain Monte Carlo (MCMC) is cumbersome due to the dimensionality and discreteness of the latent state space. Simulation-based computational approaches can address the intractability of the MJP likelihood, but are numerically fragile and prohibitively expensive for complex models. A linear noise approximation (LNA) that approximates the MJP transition density with a Gaussian density has been explored for analyzing prevalence data in large-population settings, but requires modification for analyzing incidence counts without assuming that the data are normally distributed. We demonstrate how to reparameterize SEMs to appropriately analyze incidence data, and fold the LNA into a data augmentation MCMC framework that outperforms deterministic methods, statistically, and simulation-based methods, computationally. Our framework is computationally robust when the model dynamics are complex and applies to a broad class of SEMs. We evaluate our method in simulations that reflect Ebola, influenza, and SARS-CoV-2 dynamics, and apply our method to national surveillance counts from the 2013-2015 West Africa Ebola outbreak.
适用于发病率数据的随机流行病模型(SEMs)对于阐明疫情动态、制定应对策略以及为未来疫情做准备至关重要。SEMs通常使用马尔可夫跳跃过程(MJPs)来表示处于离散感染状态的个体数量,但由于监测不完善、缺乏个体层面信息以及数据的时间粗粒度掩盖了真实疫情,其计算具有挑战性。对潜在疫情过程进行解析积分是不可能的,并且由于潜在状态空间的维度和离散性,通过马尔可夫链蒙特卡罗(MCMC)进行积分很繁琐。基于模拟的计算方法可以解决MJP似然性的难处理问题,但对于复杂模型在数值上很脆弱且成本过高。一种用高斯密度近似MJP转移密度的线性噪声近似(LNA)已被用于分析大群体环境中的患病率数据,但在不假设数据呈正态分布的情况下分析发病率计数时需要进行修改。我们展示了如何对SEMs进行重新参数化以适当地分析发病率数据,并将LNA纳入数据增强MCMC框架,该框架在统计上优于确定性方法,在计算上优于基于模拟的方法。当模型动态复杂时,我们的框架在计算上具有鲁棒性,并且适用于广泛的SEMs类别。我们在反映埃博拉、流感和SARS-CoV-2动态的模拟中评估我们的方法,并将我们的方法应用于2013 - 2015年西非埃博拉疫情的国家监测计数。