Biosciences Research Division, Department of Primary Industries, Bundoora, VIC 3083, Australia.
J Anim Sci. 2012 Oct;90(10):3375-84. doi: 10.2527/jas.2011-4557.
In genome-wide association studies, failure to remove variation due to population structure results in spurious associations. In contrast, for predictions of future phenotypes or estimated breeding values from dense SNP data, exploiting population structure arising from relatedness can actually increase the accuracy of prediction in some cases, for example, when the selection candidates are offspring of the reference population where the prediction equation was derived. In populations with large effective population size or with multiple breeds and strains, it has not been demonstrated whether and when accounting for or removing variation due to population structure will affect the accuracy of genomic prediction. Our aim in this study was to determine whether accounting for population structure would increase the accuracy of genomic predictions, both within and across breeds. First, we have attempted to decompose the accuracy of genomic prediction into contributions from population structure or linkage disequilibrium (LD) between markers and QTL using a diverse multi-breed sheep (Ovis aries) data set, genotyped for 48,640 SNP. We demonstrate that SNP from a single chromosome can achieve up to 86% of the accuracy for genomic predictions using all SNP. This result suggests that most of the prediction accuracy is due to population structure, because a single chromosome is expected to capture relationships but is unlikely to contain all QTL. We then explored principal component analysis (PCA) as an approach to disentangle the respective contributions of population structure and LD between SNP and QTL to the accuracy of genomic predictions. Results showed that fitting an increasing number of principle components (PC; as covariates) decreased within breed accuracy until a lower plateau was reached. We speculate that this plateau is a measure of the accuracy due to LD. In conclusion, a large proportion of the accuracy for genomic predictions in our data was due to variation associated with population structure. Surprisingly, accounting for this structure generally decreased the accuracy of across breed genomic predictions.
在全基因组关联研究中,如果未能去除由于群体结构导致的变异,就会产生虚假关联。相比之下,对于从密集 SNP 数据中预测未来表型或估计的育种值,利用亲缘关系引起的群体结构实际上可以在某些情况下提高预测的准确性,例如,当选择的候选者是从预测方程得出的参考群体的后代时。在具有大有效种群大小或具有多个品种和品系的种群中,尚未证明是否以及何时考虑或去除由于群体结构引起的变异会影响基因组预测的准确性。本研究的目的是确定在品种内和品种间是否考虑群体结构会提高基因组预测的准确性。首先,我们尝试使用具有多种绵羊品种(Ovis aries)的多样化数据集,对 48640 个 SNP 进行基因分型,将基因组预测的准确性分解为群体结构或标记与 QTL 之间的连锁不平衡(LD)的贡献。我们证明,来自单个染色体的 SNP 可以使用所有 SNP 实现高达 86%的基因组预测准确性。这一结果表明,大部分预测准确性归因于群体结构,因为单个染色体有望捕捉到关系,但不太可能包含所有 QTL。然后,我们探索了主成分分析(PCA)作为一种方法,以区分 SNP 与 QTL 之间的群体结构和 LD 对基因组预测准确性的各自贡献。结果表明,随着越来越多的主成分(PC;作为协变量)的拟合,品种内准确性降低,直到达到较低的高原。我们推测这个高原是 LD 导致的准确性的度量。总之,在我们的数据中,基因组预测准确性的很大一部分归因于与群体结构相关的变异。令人惊讶的是,考虑到这种结构通常会降低跨品种基因组预测的准确性。