Department of Statistics, University of Oxford, OX1 3LB, United Kingdom
School of Biological Sciences, University of Bristol, BS8 1TQ, United Kingdom.
Genetics. 2020 Oct;216(2):463-480. doi: 10.1534/genetics.120.303400. Epub 2020 Aug 7.
Temporally spaced genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing methods that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backward in time while reweighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface, for both the selection coefficient and the allele age, to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our method can produce accurate estimates of the selection coefficient and the allele age under both constant and nonconstant demographic histories. We apply our approach to reanalyze ancient DNA data associated with horse base coat colors. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference results.
时间间隔的遗传数据允许更准确地推断群体遗传参数,并对自然选择的近期作用进行假设检验。在这项工作中,我们开发了一种新的基于似然的方法,用于从等位基因频率的时间序列数据中联合估计选择系数和等位基因年龄。我们的方法基于一个隐马尔可夫模型,其中基础过程是一个受生存条件限制的 Wright-Fisher 扩散,直到最近一次采样的时间。这种表述规避了现有方法中所需的假设,即等位基因是由某个低频突变产生的。我们通过数值求解由此产生的 Kolmogorov 后向方程来计算似然,该方程在时间上向后求解,同时根据每个采样时间点的观测发射概率对解进行重新加权。这个过程将二维的似然面最大值搜索,同时对选择系数和等位基因年龄进行搜索,简化为仅对选择系数的一维搜索。我们通过广泛的模拟表明,我们的方法可以在恒定和非恒定的人口历史条件下,对选择系数和等位基因年龄进行准确的估计。我们将我们的方法应用于重新分析与马基色相关的古代 DNA 数据。我们发现,忽略人口历史或对原始样本进行分组会显著偏置推断结果。