Department of Zoology, University of Oxford, Oxford, United Kingdom.
MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom.
PLoS Comput Biol. 2022 Feb 11;18(2):e1009805. doi: 10.1371/journal.pcbi.1009805. eCollection 2022 Feb.
Inferring the dynamics of pathogen transmission during an outbreak is an important problem in infectious disease epidemiology. In mathematical epidemiology, estimates are often informed by time series of confirmed cases, while in phylodynamics genetic sequences of the pathogen, sampled through time, are the primary data source. Each type of data provides different, and potentially complementary, insight. Recent studies have recognised that combining data sources can improve estimates of the transmission rate and the number of infected individuals. However, inference methods are typically highly specialised and field-specific and are either computationally prohibitive or require intensive simulation, limiting their real-time utility. We present a novel birth-death phylogenetic model and derive a tractable analytic approximation of its likelihood, the computational complexity of which is linear in the size of the dataset. This approach combines epidemiological and phylodynamic data to produce estimates of key parameters of transmission dynamics and the unobserved prevalence. Using simulated data, we show (a) that the approximation agrees well with existing methods, (b) validate the claim of linear complexity and (c) explore robustness to model misspecification. This approximation facilitates inference on large datasets, which is increasingly important as large genomic sequence datasets become commonplace.
推断疫情爆发期间病原体传播的动态是传染病流行病学中的一个重要问题。在数学流行病学中,估计通常通过确诊病例的时间序列提供信息,而在系统发育动力学中,病原体的遗传序列则是随时间采样的主要数据源。每种类型的数据都提供了不同的、潜在互补的见解。最近的研究已经认识到,结合数据源可以提高对传播率和受感染人数的估计。然而,推断方法通常是高度专业化和针对特定领域的,要么计算成本过高,要么需要密集的模拟,从而限制了其实时实用性。我们提出了一种新颖的出生-死亡系统发育模型,并推导出了其似然函数的可处理解析近似,其计算复杂度与数据集的大小呈线性关系。这种方法结合了流行病学和系统发育学数据,以生成对传播动力学和未观察到的流行率的关键参数的估计。使用模拟数据,我们表明:(a)该近似与现有方法吻合良好;(b)验证了线性复杂度的主张;(c)探索了对模型误设的稳健性。这种近似方法便于对大型数据集进行推断,随着大型基因组序列数据集变得越来越普遍,这一点变得越来越重要。