Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104.
Genetics. 2014 Feb;196(2):509-22. doi: 10.1534/genetics.113.158220. Epub 2013 Dec 6.
Both genetic drift and natural selection cause the frequencies of alleles in a population to vary over time. Discriminating between these two evolutionary forces, based on a time series of samples from a population, remains an outstanding problem with increasing relevance to modern data sets. Even in the idealized situation when the sampled locus is independent of all other loci, this problem is difficult to solve, especially when the size of the population from which the samples are drawn is unknown. A standard χ(2)-based likelihood-ratio test was previously proposed to address this problem. Here we show that the χ(2)-test of selection substantially underestimates the probability of type I error, leading to more false positives than indicated by its P-value, especially at stringent P-values. We introduce two methods to correct this bias. The empirical likelihood-ratio test (ELRT) rejects neutrality when the likelihood-ratio statistic falls in the tail of the empirical distribution obtained under the most likely neutral population size. The frequency increment test (FIT) rejects neutrality if the distribution of normalized allele-frequency increments exhibits a mean that deviates significantly from zero. We characterize the statistical power of these two tests for selection, and we apply them to three experimental data sets. We demonstrate that both ELRT and FIT have power to detect selection in practical parameter regimes, such as those encountered in microbial evolution experiments. Our analysis applies to a single diallelic locus, assumed independent of all other loci, which is most relevant to full-genome selection scans in sexual organisms, and also to evolution experiments in asexual organisms as long as clonal interference is weak. Different techniques will be required to detect selection in time series of cosegregating linked loci.
遗传漂变和自然选择都会导致群体中等位基因的频率随时间而变化。根据群体的时间序列样本区分这两种进化力量仍然是一个悬而未决的问题,而且随着现代数据集的相关性越来越高,这个问题也越来越重要。即使在采样位点与所有其他位点独立的理想化情况下,这个问题也很难解决,尤其是当采样的群体规模未知时。之前曾提出过一种基于 χ(2)的似然比检验标准来解决这个问题。在这里,我们表明,选择的 χ(2)检验大大低估了第一类错误的概率,导致假阳性比其 P 值所指示的要多,尤其是在严格的 P 值下。我们引入了两种纠正这种偏差的方法。经验似然比检验(ELRT)在似然比统计量落在最可能的中性群体大小下获得的经验分布的尾部时拒绝中性。如果标准化等位基因频率增量分布的均值明显偏离零,则频率增量检验(FIT)拒绝中性。我们描述了这两种检验方法对选择的统计功效,并将它们应用于三个实验数据集。我们证明,ELRT 和 FIT 都有能力在实际参数范围内检测到选择,例如在微生物进化实验中遇到的那些范围。我们的分析适用于一个假定与所有其他位点独立的单二倍体基因座,这与有性生物的全基因组选择扫描最相关,也适用于克隆干扰较弱的无性生物的进化实验。检测连锁基因座的共分离时间序列中的选择需要不同的技术。