Chapuis Marie-Pierre, Estoup Arnaud
Centre de Biologie et de Gestion des Populations, Institut National pour la Recherche Agronomique, Campus International de Baillarguet, Montferrier/Lez, France.
Mol Biol Evol. 2007 Mar;24(3):621-31. doi: 10.1093/molbev/msl191. Epub 2006 Dec 5.
Microsatellite null alleles are commonly encountered in population genetics studies, yet little is known about their impact on the estimation of population differentiation. Computer simulations based on the coalescent were used to investigate the evolutionary dynamics of null alleles, their impact on F(ST) and genetic distances, and the efficiency of estimators of null allele frequency. Further, we explored how the existing method for correcting genotype data for null alleles performed in estimating F(ST) and genetic distances, and we compared this method with a new method proposed here (for F(ST) only). Null alleles were likely to be encountered in populations with a large effective size, with an unusually high mutation rate in the flanking regions, and that have diverged from the population from which the cloned allele state was drawn and the primers designed. When populations were significantly differentiated, F(ST) and genetic distances were overestimated in the presence of null alleles. Frequency of null alleles was estimated precisely with the algorithm presented in Dempster et al. (1977). The conventional method for correcting genotype data for null alleles did not provide an accurate estimate of F(ST) and genetic distances. However, the use of the genetic distance of Cavalli-Sforza and Edwards (1967) corrected by the conventional method gave better estimates than those obtained without correction. F(ST) estimation from corrected genotype frequencies performed well when restricted to visible allele sizes. Both the proposed method and the traditional correction method have been implemented in a program that is available free of charge at http://www.montpellier.inra.fr/URLB/. We used 2 published microsatellite data sets based on original and redesigned pairs of primers to empirically confirm our simulation results.
微卫星无效等位基因在群体遗传学研究中很常见,但对于它们对群体分化估计的影响却知之甚少。基于溯祖理论的计算机模拟被用于研究无效等位基因的进化动态、它们对F(ST)和遗传距离的影响,以及无效等位基因频率估计器的效率。此外,我们探讨了现有的针对无效等位基因校正基因型数据的方法在估计F(ST)和遗传距离时的表现,并将该方法与这里提出的一种新方法(仅用于F(ST))进行了比较。在有效群体大小较大、侧翼区域突变率异常高、并且与克隆等位基因状态及设计引物所源自的群体发生了分化的群体中,可能会遇到无效等位基因。当群体存在显著分化时,在有无效等位基因的情况下,F(ST)和遗传距离会被高估。使用Dempster等人(1977年)提出的算法可以精确估计无效等位基因的频率。传统的针对无效等位基因校正基因型数据的方法无法准确估计F(ST)和遗传距离。然而,使用经传统方法校正的Cavalli-Sforza和Edwards(1967年)的遗传距离,比未校正时得到的估计值更好。当仅限于可见等位基因大小时,根据校正后的基因型频率估计F(ST)表现良好。所提出的方法和传统校正方法都已在一个程序中实现,该程序可在http://www.montpellier.inra.fr/URLB/免费获取。我们使用了2个已发表的基于原始和重新设计引物对的微卫星数据集,以实证验证我们的模拟结果。