Department of Biology, Pennsylvania State University, University Park, PA, USA.
J R Soc Interface. 2010 Jul 6;7(48):1119-27. doi: 10.1098/rsif.2009.0530. Epub 2010 Feb 10.
With more emphasis being put on global infectious disease monitoring, viral genetic data are being collected at an astounding rate, both within and without the context of a long-term disease surveillance plan. Concurrent with this increase have come improvements to the sophisticated and generalized statistical techniques used for extracting population-level information from genetic sequence data. However, little research has been done on how the collection of these viral sequence data can or does affect the efficacy of the phylogenetic algorithms used to analyse and interpret them. In this study, we use epidemic simulations to consider how the collection of viral sequence data clarifies or distorts the picture, provided by the phylogenetic algorithms, of the underlying population dynamics of the simulated viral infection over many epidemic cycles. We find that sampling protocols purposefully designed to capture sequences at specific points in the epidemic cycle, such as is done for seasonal influenza surveillance, lead to a significantly better view of the underlying population dynamics than do less-focused collection protocols. Our results suggest that the temporal distribution of samples can have a significant effect on what can be inferred from genetic data, and thus highlight the importance of considering this distribution when designing or evaluating protocols and analysing the data collected thereunder.
随着对全球传染病监测的重视,病毒遗传数据的收集速度惊人,无论是在长期疾病监测计划的背景下还是之外。与此同时,用于从遗传序列数据中提取群体水平信息的复杂和通用统计技术也得到了改进。然而,对于这些病毒序列数据的收集如何或是否会影响用于分析和解释它们的系统发育算法的功效,研究还很少。在这项研究中,我们使用传染病模拟来考虑病毒序列数据的收集如何在许多传染病周期中阐明或扭曲系统发育算法所提供的模拟病毒感染的潜在人群动态的情况。我们发现,旨在在传染病周期的特定时间点捕获序列的采样方案,例如季节性流感监测中所做的那样,比不那么集中的采集方案能更好地了解潜在的人群动态。我们的研究结果表明,样本的时间分布会对从遗传数据中推断出的内容产生重大影响,因此在设计或评估方案以及分析所收集的数据时,强调考虑这种分布的重要性。