Desai Michael M, Plotkin Joshua B
Department of Biology and Program in Applied Mathematics and Computation Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
Genetics. 2008 Dec;180(4):2175-91. doi: 10.1534/genetics.108.087361. Epub 2008 Oct 14.
The distribution of genetic polymorphisms in a population contains information about evolutionary processes. The Poisson random field (PRF) model uses the polymorphism frequency spectrum to infer the mutation rate and the strength of directional selection. The PRF model relies on an infinite-sites approximation that is reasonable for most eukaryotic populations, but that becomes problematic when is large ( greater, similar 0.05). Here, we show that at large mutation rates characteristic of microbes and viruses the infinite-sites approximation of the PRF model induces systematic biases that lead it to underestimate negative selection pressures and mutation rates and erroneously infer positive selection. We introduce two new methods that extend our ability to infer selection pressures and mutation rates at large : a finite-site modification of the PRF model and a new technique based on diffusion theory. Our methods can be used to infer not only a "weighted average" of selection pressures acting on a gene sequence, but also the distribution of selection pressures across sites. We evaluate the accuracy of our methods, as well that of the original PRF approach, by comparison with Wright-Fisher simulations.
群体中遗传多态性的分布包含有关进化过程的信息。泊松随机场(PRF)模型利用多态性频谱来推断突变率和定向选择的强度。PRF模型依赖于无限位点近似,这对大多数真核生物群体来说是合理的,但当 很大(大于、近似 0.05)时就会出现问题。在这里,我们表明,在微生物和病毒特有的高突变率下,PRF 模型的无限位点近似会产生系统性偏差,导致其低估负选择压力和突变率,并错误地推断正选择。我们引入了两种新方法,扩展了我们在高 时推断选择压力和突变率的能力:PRF 模型的有限位点修正和基于扩散理论的新技术。我们的方法不仅可用于推断作用于基因序列的选择压力的“加权平均值”,还可用于推断各位点选择压力的分布。我们通过与赖特 - 费希尔模拟进行比较,评估了我们方法以及原始 PRF 方法的准确性。