Department of Organismic and Evolutionary Biology, Harvard University, Boston, MA, USA.
Mol Biol Evol. 2010 Jan;27(1):73-89. doi: 10.1093/molbev/msp209.
A large number of statistical tests have been proposed to detect natural selection based on a sample of variation at a single genetic locus. These tests measure the deviation of the allelic frequency distribution observed within populations from the distribution expected under a set of assumptions that includes both neutral evolution and equilibrium population demography. The present study considers a new way to assess the statistical properties of these tests of selection, by their behavior in response to direct perturbations of the steady-state allelic frequency distribution, unconstrained by any particular nonequilibrium demographic scenario. Results from Monte Carlo computer simulations indicate that most tests of selection are more sensitive to perturbations of the allele frequency distribution that increase the variance in allele frequencies than to perturbations that decrease the variance. Simulations also demonstrate that it requires, on average, 4N generations (N is the diploid effective population size) for tests of selection to relax to their theoretical, steady-state distributions following different perturbations of the allele frequency distribution to its extremes. This relatively long relaxation time highlights the fact that these tests are not robust to violations of the other assumptions of the null model besides neutrality. Lastly, genetic variation arising under an example of a regularly cycling demographic scenario is simulated. Tests of selection performed on this last set of simulated data confirm the confounding nature of these tests for the inference of natural selection, under a demographic scenario that likely holds for many species. The utility of using empirical, genomic distributions of test statistics, instead of the theoretical steady-state distribution, is discussed as an alternative for improving the statistical inference of natural selection.
大量的统计检验方法被提出用于检测自然选择,这些方法基于单个遗传位点的变异样本。这些检验方法测量了在种群中观察到的等位基因频率分布与在一组假设下预期的分布之间的偏差,这些假设包括中性进化和平衡种群动态。本研究考虑了一种新的方法来评估这些选择检验的统计性质,通过它们在不受任何特定非平衡种群动态场景限制的情况下,对稳定态等位基因频率分布的直接扰动的响应行为来评估。蒙特卡罗计算机模拟的结果表明,大多数选择检验方法对于增加等位基因频率方差的等位基因频率分布的扰动比对于降低方差的扰动更为敏感。模拟还表明,在不同的等位基因频率分布的扰动下,选择检验需要平均 4N 代(N 是二倍体有效种群大小)才能放松到其理论的稳定态分布。这种相对较长的松弛时间突出了这样一个事实,即这些检验方法对于除中性以外的零模型的其他假设的违反并不稳健。最后,模拟了一个周期性种群动态场景下产生的遗传变异。对最后一组模拟数据进行的选择检验证实了这些检验方法对于推断自然选择的混淆性质,在这种种群动态场景下,可能适用于许多物种。讨论了使用检验统计量的经验、基因组分布,而不是理论稳定态分布,作为改进自然选择的统计推断的替代方法的实用性。