Jewett Ethan M, Rosenberg Noah A
Department of Biology, Stanford University, 371 Serra Mall, Stanford, CA 94305-5020, USA.
Theor Popul Biol. 2014 May;93:14-29. doi: 10.1016/j.tpb.2013.12.007. Epub 2014 Jan 7.
Under the coalescent model, the random number nt of lineages ancestral to a sample is nearly deterministic as a function of time when nt is moderate to large in value, and it is well approximated by its expectation E[nt]. In turn, this expectation is well approximated by simple deterministic functions that are easy to compute. Such deterministic functions have been applied to estimate allele age, effective population size, and genetic diversity, and they have been used to study properties of models of infectious disease dynamics. Although a number of simple approximations of E[nt] have been derived and applied to problems of population-genetic inference, the theoretical accuracy of the resulting approximate formulas and the inferences obtained using these approximations is not known, and the range of problems to which they can be applied is not well understood. Here, we demonstrate general procedures by which the approximation nt≈E[nt] can be used to reduce the computational complexity of coalescent formulas, and we show that the resulting approximations converge to their true values under simple assumptions. Such approximations provide alternatives to exact formulas that are computationally intractable or numerically unstable when the number of sampled lineages is moderate or large. We also extend an existing class of approximations of E[nt] to the case of multiple populations of time-varying size with migration among them. Our results facilitate the use of the deterministic approximation nt≈E[nt] for deriving functionally simple, computationally efficient, and numerically stable approximations of coalescent formulas under complicated demographic scenarios.
在溯祖模型下,当样本的祖先谱系数量(n_t)的值适中到较大时,作为时间函数的(n_t)几乎是确定性的,并且可以很好地用其期望值(E[n_t])来近似。反过来,这个期望值又可以用易于计算的简单确定性函数很好地近似。这样的确定性函数已被用于估计等位基因年龄、有效种群大小和遗传多样性,并且已被用于研究传染病动力学模型的性质。尽管已经推导了许多(E[n_t])的简单近似并将其应用于群体遗传推断问题,但所得近似公式的理论准确性以及使用这些近似得到的推断尚不清楚,并且它们可应用的问题范围也未得到很好的理解。在这里,我们展示了一些通用程序,通过这些程序,近似(n_t≈E[n_t])可用于降低溯祖公式的计算复杂性,并且我们表明在简单假设下,所得近似会收敛到其真实值。当抽样谱系数量适中或较大时,此类近似为计算上难以处理或数值不稳定的精确公式提供了替代方案。我们还将现有的一类(E[n_t])近似扩展到具有时变大小且其间有迁移的多个群体的情况。我们的结果有助于在复杂的人口统计场景下,使用确定性近似(n_t≈E[n_t])来推导功能简单、计算高效且数值稳定的溯祖公式近似。