Rousset François, Leblois Raphaël
Université, Montpellier 2, CNRS, Institut des Sciences de l'Evolution, France.
Mol Biol Evol. 2007 Dec;24(12):2730-45. doi: 10.1093/molbev/msm206. Epub 2007 Sep 24.
We evaluate the performance of maximum likelihood (ML) analysis of allele frequency data in a linear array of populations. The parameters are a mutation rate and either the dispersal rate in a stepping stone model or a dispersal rate and a scale parameter in a geometric dispersal model. An approximate procedure known as maximum product of approximate conditional (PAC) likelihood is found to perform as well as ML. Mis-specification biases may occur because the importance sampling algorithm is formally defined in term of mutation and migration rates scaled by the total size of the population, and this size may differ widely in the statistical model and in reality. As could be expected, ML generally performs well when the statistical model is correctly specified. Otherwise, mutation rate estimates are much closer to mutation probability scaled by number of demes in the statistical model than scaled by number of demes in reality when mutation probability is high and dispersal is most limited. This mis-specification bias actually has practical benefits. However, opposite results are found in opposite conditions. Migration rate estimates show roughly similar trends, but they may not always be easily interpreted as low-bias estimates of dispersal rate under any scaling. Estimation of the dispersal scale parameter is also affected by mis-specification of the number of demes, and the different biases compensate each other in such a way that good estimation of the so-called neighborhood size (or more precisely the product of population density and mean-squared parent-offspring dispersal distance) is achieved. Results congruent with these findings are found in an application to a damselfly data set.
我们评估了在呈线性排列的种群中等位基因频率数据的最大似然(ML)分析的性能。参数包括一个突变率,以及在踏脚石模型中的扩散率,或者在几何扩散模型中的扩散率和一个尺度参数。结果发现一种被称为近似条件似然最大乘积(PAC)似然的近似方法与ML的性能相当。可能会出现模型误设偏差,因为重要性抽样算法是根据按种群总大小缩放的突变率和迁移率正式定义的,而种群大小在统计模型和实际情况中可能有很大差异。不出所料,当统计模型被正确设定时,ML通常表现良好。否则,当突变概率较高且扩散最受限时,突变率估计值更接近于按统计模型中的deme数量缩放的突变概率,而不是按实际中的deme数量缩放的突变概率。这种模型误设偏差实际上有实际好处。然而,在相反的条件下会得到相反的结果。迁移率估计值显示出大致相似的趋势,但在任何缩放情况下,它们可能并不总是容易被解释为扩散率的低偏差估计值。扩散尺度参数的估计也受到deme数量模型误设的影响,并且不同的偏差相互补偿,从而实现对所谓邻域大小(或更准确地说是种群密度与亲子平均平方扩散距离的乘积)的良好估计。在对豆娘数据集的应用中发现了与这些发现一致的结果。