Institute of Zoology, Zoological Society of London, London, NW1 4RY, UK.
Mol Ecol Resour. 2017 Sep;17(5):981-990. doi: 10.1111/1755-0998.12650. Epub 2017 Feb 7.
The computer program Structure implements a Bayesian method, based on a population genetics model, to assign individuals to their source populations using genetic marker data. It is widely applied in the fields of ecology, evolutionary biology, human genetics and conservation biology for detecting hidden genetic structures, inferring the most likely number of populations (K), assigning individuals to source populations and estimating admixture and migration rates. Recently, several simulation studies repeatedly concluded that the program yields erroneous inferences when samples from different populations are highly unbalanced in size. Analysing both simulated and empirical data sets, this study confirms that Structure indeed yields poor individual assignments to source populations and gives frequently incorrect estimates of K when sampling is unbalanced. However, this poor performance is mainly caused by the adoption of the default ancestry prior, which assumes all source populations contribute equally to the pooled sample of individuals. When the alternative ancestry prior, which allows for unequal representations of the source populations by the sample, is adopted, accurate individual assignments could be obtained even if sampling is highly unbalanced. The alternative prior also improves the inference of K by two estimators, albeit the improvement is not as much as that in individual assignments to populations. For the difficult case of many populations and unbalanced sampling, a rarely used parameter combination of the alternative ancestry prior, an initial ALPHA value much smaller than the default and the uncorrelated allele frequency model is required for Structure to yield accurate inferences. I conclude that Structure is easy to use but is easier to misuse because of its complicated genetic model and many parameter (prior) options which may not be obvious to choose, and suggest using multiple plausible models (parameters) and K estimators in conducting comparative and exploratory Structure analysis.
计算机程序 Structure 实现了一种基于群体遗传学模型的贝叶斯方法,可利用遗传标记数据将个体分配到其来源群体。它广泛应用于生态学、进化生物学、人类遗传学和保护生物学领域,用于检测隐藏的遗传结构、推断最可能的群体数量 (K)、将个体分配到来源群体以及估计混合和迁移率。最近,几项模拟研究反复得出结论,当来自不同群体的样本在大小上高度不平衡时,该程序会产生错误的推断。本研究通过分析模拟和实际数据集,证实了 Structure 确实会导致个体对来源群体的分配较差,并经常对 K 给出不正确的估计,特别是在采样不平衡时。然而,这种较差的性能主要是由于采用了默认的祖先先验,该先验假设所有来源群体都平等地为个体的混合样本做出贡献。当采用允许样本中来源群体的代表性不平等的替代祖先先验时,即使采样高度不平衡,也可以获得准确的个体分配。替代先验还通过两个估计器改进了 K 的推断,尽管改进程度不如对种群的个体分配。对于许多群体和采样不平衡的困难情况,需要采用替代祖先先验的一个很少使用的参数组合,即初始 ALPHA 值远小于默认值和非相关等位基因频率模型,才能使 Structure 产生准确的推断。我得出的结论是,Structure 易于使用,但由于其复杂的遗传模型和许多参数(先验)选项,可能不太容易选择,因此更容易被滥用,并建议在进行比较和探索性 Structure 分析时使用多个合理的模型(参数)和 K 估计器。