Unité Bioinformatique Evolutive, Institut Pasteur, Université de Paris, 28 rue du docteur Roux, 75015 Paris, France.
Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, 28 rue du docteur Roux, 75015 Paris, France.
Syst Biol. 2023 Dec 30;72(6):1387-1402. doi: 10.1093/sysbio/syad059.
Multi-type birth-death (MTBD) models are phylodynamic analogies of compartmental models in classical epidemiology. They serve to infer such epidemiological parameters as the average number of secondary infections Re and the infectious time from a phylogenetic tree (a genealogy of pathogen sequences). The representatives of this model family focus on various aspects of pathogen epidemics. For instance, the birth-death exposed-infectious (BDEI) model describes the transmission of pathogens featuring an incubation period (when there is a delay between the moment of infection and becoming infectious, as for Ebola and SARS-CoV-2), and permits its estimation along with other parameters. With constantly growing sequencing data, MTBD models should be extremely useful for unravelling information on pathogen epidemics. However, existing implementations of these models in a phylodynamic framework have not yet caught up with the sequencing speed. Computing time and numerical instability issues limit their applicability to medium data sets (≤ 500 samples), while the accuracy of estimations should increase with more data. We propose a new highly parallelizable formulation of ordinary differential equations for MTBD models. We also extend them to forests to represent situations when a (sub-)epidemic started from several cases (e.g., multiple introductions to a country). We implemented it for the BDEI model in a maximum likelihood framework using a combination of numerical analysis methods for efficient equation resolution. Our implementation estimates epidemiological parameter values and their confidence intervals in two minutes on a phylogenetic tree of 10,000 samples. Comparison to the existing implementations on simulated data shows that it is not only much faster but also more accurate. An application of our tool to the 2014 Ebola epidemic in Sierra-Leone is also convincing, with very fast calculation and precise estimates. As MTBD models are closely related to Cladogenetic State Speciation and Extinction (ClaSSE)-like models, our findings could also be easily transferred to the macroevolution domain.
多类型birth-death(MTBD)模型是经典流行病学中房室模型的系统发育类比。它们用于从系统发育树(病原体序列的系统发育)推断诸如平均继发感染数 Re 和感染时间等流行病学参数。该模型家族的代表专注于病原体流行的各个方面。例如,birth-death-exposed-infectious(BDEI)模型描述了具有潜伏期的病原体的传播(当感染和具有传染性之间存在延迟时,例如埃博拉病毒和 SARS-CoV-2),并允许与其其他参数一起进行估计。随着测序数据的不断增长,MTBD 模型应该对于揭示病原体流行信息非常有用。然而,在系统发育框架中对这些模型的现有实现尚未赶上测序速度。计算时间和数值不稳定性问题限制了它们在中等数据集(≤500 个样本)中的适用性,而估计的准确性应该随着数据的增加而提高。我们提出了一种新的高度可并行化的 MTBD 模型常微分方程公式。我们还将它们扩展到森林中,以表示(亚)流行从几个病例开始的情况(例如,向一个国家的多次引入)。我们在最大似然框架中使用数值分析方法的组合为 BDEI 模型实现了它,以实现高效的方程求解。我们的实现可以在两分钟内在 10000 个样本的系统发育树上估计流行病学参数值及其置信区间。与模拟数据的现有实现相比,它不仅更快,而且更准确。我们的工具在塞拉利昂 2014 年埃博拉疫情中的应用也令人信服,计算速度非常快,估计结果非常准确。由于 MTBD 模型与分支分类状态发生和灭绝(ClaSSE)-样模型密切相关,因此我们的发现也可以轻松地转移到宏观进化领域。