School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.
PLoS One. 2012;7(9):e45170. doi: 10.1371/journal.pone.0045170. Epub 2012 Sep 12.
One of the most common questions asked before starting a new population genetic study using microsatellite allele frequencies is "how many individuals do I need to sample from each population?" This question has previously been answered by addressing how many individuals are needed to detect all of the alleles present in a population (i.e. rarefaction based analyses). However, we argue that obtaining accurate allele frequencies and accurate estimates of diversity are much more important than detecting all of the alleles, given that very rare alleles (i.e. new mutations) are not very informative for assessing genetic diversity within a population or genetic structure among populations. Here we present a comparison of allele frequencies, expected heterozygosities and genetic distances between real and simulated populations by randomly subsampling 5-100 individuals from four empirical microsatellite genotype datasets (Formica lugubris, Sciurus vulgaris, Thalassarche melanophris, and Himantopus novaezelandia) to create 100 replicate datasets at each sample size. Despite differences in taxon (two birds, one mammal, one insect), population size, number of loci and polymorphism across loci, the degree of differences between simulated and empirical dataset allele frequencies, expected heterozygosities and pairwise F(ST) values were almost identical among the four datasets at each sample size. Variability in allele frequency and expected heterozygosity among replicates decreased with increasing sample size, but these decreases were minimal above sample sizes of 25 to 30. Therefore, there appears to be little benefit in sampling more than 25 to 30 individuals per population for population genetic studies based on microsatellite allele frequencies.
在开始使用微卫星等位基因频率进行新的群体遗传学研究之前,最常被问到的问题之一是“我需要从每个群体中采样多少个体?”这个问题以前是通过解决需要多少个体来检测群体中存在的所有等位基因(即基于稀疏分析的分析)来回答的。然而,我们认为,获得准确的等位基因频率和多样性的准确估计比检测所有等位基因更为重要,因为非常罕见的等位基因(即新突变)对于评估群体内的遗传多样性或群体间的遗传结构并没有太大的信息量。在这里,我们通过从四个经验微卫星基因型数据集(Formica lugubris、Sciurus vulgaris、Thalassarche melanophris 和 Himantopus novaezelandia)中随机抽取 5-100 个个体来创建每个样本大小的 100 个重复数据集,比较了真实和模拟群体之间的等位基因频率、预期杂合度和遗传距离。尽管在分类群(两种鸟类、一种哺乳动物、一种昆虫)、种群大小、位点数量和多态性方面存在差异,但在每个样本大小下,四个数据集之间模拟和经验数据集等位基因频率、预期杂合度和成对 F(ST)值之间的差异程度几乎相同。等位基因频率和预期杂合度在重复样本中的变异性随着样本量的增加而减小,但在样本量超过 25 到 30 后,这种减少最小。因此,基于微卫星等位基因频率进行群体遗传学研究,每个群体采样超过 25 到 30 个个体似乎没有什么好处。