Fisheries Ecology Division, Southwest Fisheries Science Center, 110 Shaffer Road, Santa Cruz, CA 95060, USA.
Mol Ecol Resour. 2008 Nov;8(6):1219-29. doi: 10.1111/j.1755-0998.2008.02355.x.
Unsupervised clustering algorithms, like the program Structure, are increasingly used to infer the presence of population structure from a sample of genotyped individuals. We evaluate the extent to which the presence of related individuals can lead such algorithms to the false inference that there is population structure. First, we demonstrate this problem using a real data set from a rainbow trout (Oncorhynchus mykiss) population. Then we perform an extensive series of simulations involving the program Structure. Our simulations encompass both a simple scenario with fixed numbers of full- and half-siblings in the sample, and a more complicated scenario in which we investigate 360 combinations of population divergence, fraction of population sampled, variance in family size, mating system and number of loci. We find that the inclusion of family members in a sample may produce very strong evidence of population structure, even when population structure is absent. This problem becomes more pronounced when more loci are genotyped, and it is particularly likely in studies of monogamous species, especially if variance in family size is high and a large fraction of a small population has been sampled. Researchers working in such situations should test observed clusters for the presence of family members to distinguish family-induced structure from real population structure. Additionally, this work shows that Structure's ability to estimate the number of subpopulations may be influenced by a number of factors, and therefore should be interpreted guardedly.
无监督聚类算法,如 Structure 程序,越来越多地被用于从基因型个体样本中推断群体结构的存在。我们评估了相关个体的存在程度会导致这些算法错误地推断存在群体结构。首先,我们使用虹鳟鱼(Oncorhynchus mykiss)群体的真实数据集来证明这个问题。然后,我们进行了一系列广泛的 Structure 模拟。我们的模拟涵盖了样本中固定数量的全同胞和半同胞的简单情况,以及更复杂的情况,我们研究了 360 种群体分歧、抽样群体的比例、家庭大小方差、交配系统和位点数量的组合。我们发现,即使在没有群体结构的情况下,样本中包含家庭成员也可能产生非常强烈的群体结构证据。当更多的位点被基因分型时,这个问题变得更加明显,对于单配物种的研究尤其如此,尤其是当家庭大小方差较高且小种群的大部分已被抽样时。在这种情况下工作的研究人员应该检查观察到的聚类是否存在家庭成员,以区分由家庭引起的结构和真实的群体结构。此外,这项工作表明,Structure 估计亚群数量的能力可能受到多种因素的影响,因此应该谨慎解释。