Melfi Andrew, Viswanath Divakar
Department of Mathematics, University of Michigan, United States.
Theor Popul Biol. 2018 Dec;124:81-92. doi: 10.1016/j.tpb.2018.09.005. Epub 2018 Oct 9.
The first terms of the Wright-Fisher (WF) site frequency spectrum that follow the coalescent approximation are determined precisely, with a view to understanding the accuracy of the coalescent approximation for large samples. The perturbing terms show that the probability of a single mutant in the sample (singleton probability) is elevated in WF but the rest of the frequency spectrum is lowered. A part of the perturbation can be attributed to a mismatch in rates of merger between WF and the coalescent. The rest of it can be attributed to the difference in the way WF and the coalescent partition children between parents. In particular, the number of children of a parent is approximately Poisson under WF and approximately geometric under the coalescent. Whereas the mismatch in rates raises the probability of singletons under WF, its offspring distribution being approximately Poisson lowers it. The two effects are of opposite sense everywhere except at the tail of the frequency spectrum. The WF frequency spectrum begins to depart from that of the coalescent only for sample sizes that are comparable to the population size. These conclusions are confirmed by a separate analysis that assumes the sample size n to be equal to the population size N. Partly thanks to the canceling effects, the total variation distance of WF minus coalescent is 0.12∕logN for a population sized sample with n=N, which is only 1% for N=2×10. The coalescent remains a good approximation for the site frequency spectrum of-large samples.
为了理解大样本下溯祖近似的准确性,精确确定了遵循溯祖近似的赖特-费希尔(WF)位点频率谱的首项。微扰项表明,样本中单个突变体的概率(单例概率)在WF中升高,但频率谱的其余部分降低。部分微扰可归因于WF与溯祖之间合并率的不匹配。其余部分可归因于WF与溯祖在亲子代之间分配子代方式的差异。具体而言,在WF下,亲本的子代数近似为泊松分布,而在溯祖下近似为几何分布。虽然合并率的不匹配提高了WF下单例的概率,但其近似泊松分布的子代分布却降低了该概率。除了在频率谱的尾部,这两种效应在各处的意义相反。WF频率谱仅在样本大小与种群大小相当的情况下才开始偏离溯祖频率谱。通过假设样本大小n等于种群大小N的单独分析,证实了这些结论。部分由于抵消效应,对于n = N的种群大小样本,WF减去溯祖的总变差距离为0.12∕logN,对于N = 2×10,该值仅为1%。对于大样本的位点频率谱,溯祖仍然是一个很好的近似。