Drummond Alexei J, Suchard Marc A
Bioinformatics Institute, University of Auckland, Auckland, New Zealand.
BMC Genet. 2008 Oct 31;9:68. doi: 10.1186/1471-2156-9-68.
Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome.
Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size.
Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
已经开发了许多数据汇总统计量来检测与进化模型的中性预期的偏差。然而,关于自然种群中基因座进化的中性问题仍然难以评估。造成这种困难的一个关键原因是,大多数检验中性的方法同时对突变模型和种群大小模型做出了简化假设。因此,在这些方法下拒绝中性零假设可能是由于违反了其中一个或两个假设,这使得解释变得麻烦。
在这里,我们利用后验预测模拟来利用数据和模型参数的汇总统计量来检验标准进化模型的拟合优度。我们应用该方法来检验非重组基因谱系中分子进化的选择中性,并在四个真实数据集上证明了我们方法的实用性,即使在控制了种群大小的变化之后,也能识别出人类甲型流感病毒中性的显著偏差。
重要的是,通过采用基于全模型的贝叶斯分析,我们的方法将人口统计学效应与选择效应分开。该方法还允许协同使用多个汇总统计量,从而可能提高灵敏度。此外,我们的方法在汇总统计量的分析期望和方差不可用的情况下仍然有用。这一方面对于时间间隔数据的分析具有巨大潜力,这是一个以前由于理论和方法的有限可用性而被忽视的不断扩大的领域。