Medina Catalina M, Palacios Julia A, Minin Volodymyr M
Department of Statistics, University of California, Irvine, Irvine, California, United States of America.
Departments of Statistics and Biomedical Data Science, Stanford University, Stanford, California, United States of America.
PLoS Comput Biol. 2025 May 6;21(5):e1012970. doi: 10.1371/journal.pcbi.1012970. eCollection 2025 May.
The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity. These changes in effective population size can be connected to the changes in the number of infections in the population of interest under certain conditions. Phylodynamics is an important set of tools because its methods are often resilient to the ascertainment biases present in traditional surveillance data (e.g., preferentially testing symptomatic individuals). Unfortunately, it takes weeks or months to sequence and deposit the sampled pathogen genetic sequences into a database, making them available for such analyses. These reporting delays severely decrease precision of phylodynamic methods closer to present time, and for some models can lead to extreme biases. Here we present a method that affords reliable estimation of the effective population size trajectory closer to the time of data collection, allowing for policy decisions to be based on more recent data. Our work uses readily available historic times between sampling and reporting of sequenced samples for a population of interest, and incorporates this information into the sampling model to mitigate the effects of reporting delay in real-time analyses. We illustrate our methodology on simulated data and on SARS-CoV-2 sequences collected in the state of Washington in 2021.
新冠疫情表明,对持续收集的传染病监测数据进行快速准确的分析对于态势感知和政策制定至关重要。基于溯祖理论的系统发育动力学分析可以利用病原体的基因序列来估计其有效种群大小的变化,有效种群大小是衡量遗传多样性的一个指标。在某些条件下,有效种群大小的这些变化可以与目标人群中感染数量的变化联系起来。系统发育动力学是一组重要的工具,因为其方法通常能抵御传统监测数据中存在的确定偏差(例如,优先检测有症状的个体)。不幸的是,对采样的病原体基因序列进行测序并存入数据库需要数周或数月时间,才能用于此类分析。这些报告延迟严重降低了系统发育动力学方法在接近当前时间时的精度,并且对于某些模型可能导致极端偏差。在此,我们提出一种方法,能够在更接近数据收集时间时可靠地估计有效种群大小轨迹,从而使政策决策能够基于更新的数据。我们的工作利用了目标人群中已测序样本在采样和报告之间随时可用的历史时间,并将此信息纳入采样模型,以减轻实时分析中报告延迟的影响。我们在模拟数据以及2021年在华盛顿州收集的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)序列上展示了我们的方法。