Department of Zoology, University of Oxford, United Kingdom.
Infect Genet Evol. 2010 Apr;10(3):421-30. doi: 10.1016/j.meegid.2009.06.001. Epub 2009 Jun 11.
At present, most analyses that aim to detect the action of natural selection upon viral gene sequences use phylogenetic estimates of the ratio of silent to replacement mutations. Such methods, however, are impractical to compute on large data sets comprising hundreds of complete viral genomes, which are becoming increasingly common due to advances in genome sequencing technology. Here we investigate the statistical performance of computationally efficient tests that are based on sequence summary statistics, and explore their applicability to RNA virus data sets in two ways. Firstly, we perform extensive simulations in order to measure the type I error of two well-known summary statistic methods - Tajima's D and the McDonald-Kreitman test - under a range of virus-like mutational and demographic scenarios. Secondly, we apply these methods to a compilation of approximately 100 RNA virus alignments that represent natural RNA virus populations. In addition, we develop and introduce a new implementation of the McDonald-Kreitman test and show that it greatly improves the test's statistical reliability on typical viral data sets. Our results suggest that variants of the McDonald-Kreitman test could prove useful in the analysis of very large sets of highly diverse viral genetic data.
目前,大多数旨在检测自然选择对病毒基因序列作用的分析都使用了对沉默突变与替换突变比例的系统发育估计。然而,对于包含数百个完整病毒基因组的大型数据集,这些方法在计算上是不切实际的,因为基因组测序技术的进步使得此类数据集越来越常见。在这里,我们研究了基于序列汇总统计量的计算效率测试的统计性能,并通过两种方式探索了它们在 RNA 病毒数据集上的适用性。首先,我们进行了广泛的模拟,以在一系列类似于病毒的突变和种群动态场景下衡量两种著名的汇总统计量方法(Tajima 的 D 和 McDonald-Kreitman 检验)的 I 型错误率。其次,我们将这些方法应用于大约 100 个 RNA 病毒比对的汇编,这些比对代表了自然 RNA 病毒群体。此外,我们开发并引入了 McDonald-Kreitman 检验的新实现,并表明它极大地提高了该检验在典型病毒数据集上的统计可靠性。我们的研究结果表明,McDonald-Kreitman 检验的变体可能在分析非常多样化的大量病毒遗传数据时非常有用。