Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames 50011, USA.
J Anim Sci. 2010 Jan;88(1):32-46. doi: 10.2527/jas.2009-1975. Epub 2009 Sep 11.
In livestock, genomic selection (GS) has primarily been investigated by simulation of purebred populations. Traits of interest are, however, often measured in crossbred or mixed populations with uncertain breed composition. If such data are used as the training data for GS without accounting for breed composition, estimates of marker effects may be biased due to population stratification and admixture. To investigate this, a genome of 100 cM was simulated with varying marker densities (5 to 40 segregating markers per cM). After 1,000 generations of random mating in a population of effective size 500, 4 lines with effective size 100 were isolated and mated for another 50 generations to create 4 pure breeds. These breeds were used to generate combined, F(1), F(2), 3- and 4-way crosses, and admixed training data sets of 1,000 individuals with phenotypes for an additive trait controlled by 100 segregating QTL and heritability of 0.30. The validation data set was a sample of 1,000 genotyped individuals from one pure breed. Method Bayes-B was used to simultaneously estimate the effects of all markers for breeding value estimation. With 5 (40) markers per cM, the correlation of true with estimated breeding value of selection candidates (accuracy) was greatest, 0.79 (0.85), when data from the same pure breed were used for training. When the training data set consisted of crossbreds, the accuracy ranged from 0.66 (0.79) to 0.74 (0.83) for the 2 marker densities, respectively. The admixed training data set resulted in nearly the same accuracies as when training was in the breed to which selection candidates belonged. However, accuracy was greatly reduced when genes from the target pure breed were not included in the admixed or crossbred population. This implies that, with high-density markers, admixed and crossbred populations can be used to develop GS prediction equations for all pure breeds that contributed to the population, without a substantial loss of accuracy compared with training on purebred data, even if breed origin has not been explicitly taken into account. In addition, using GS based on high-density marker data, purebreds can be accurately selected for crossbred performance without the need for pedigree or breed information. Results also showed that haplotype segments with strong linkage disequilibrium are shorter in crossbred and admixed populations than in purebreds, providing opportunities for QTL fine mapping.
在畜牧业中,基因组选择(GS)主要通过纯种群的模拟进行研究。然而,感兴趣的性状通常在杂种或混合种群中进行测量,这些种群的组成不确定。如果在不考虑种群结构和混合的情况下,将此类数据用作 GS 的训练数据,则由于群体分层和混合,标记效应的估计可能会有偏差。为了研究这一点,使用具有不同标记密度(每厘米 5 到 40 个分离标记)的 100 厘米基因组进行模拟。在有效大小为 500 的种群中经过 1000 代随机交配后,分离出 4 条有效大小为 100 的品系,并交配 50 代以创建 4 个纯系。这些品系用于生成组合、F(1)、F(2)、3 路和 4 路杂交以及混合训练数据集,这些数据集包含 1000 个个体的表型,表型受 100 个分离 QTL 控制,遗传力为 0.30。验证数据集是从一个纯系中随机选择的 1000 个个体的基因型。使用贝叶斯-B 方法同时估计所有标记的育种值估计效应。在每厘米 5(40)个标记的情况下,当使用相同的纯系数据进行训练时,选择候选者的真实与估计育种值的相关性(准确性)最大,为 0.79(0.85)。当训练数据集由杂种组成时,对于两种标记密度,准确性分别在 0.66(0.79)到 0.74(0.83)之间。混合训练数据集的准确性与选择候选者所属的品种相同。然而,当目标纯系中的基因未包含在混合或杂交群体中时,准确性会大大降低。这意味着,使用高密度标记,即使没有明确考虑品种起源,混合和杂交群体也可以用于为所有对种群有贡献的纯系开发 GS 预测方程,而不会与纯系数据训练相比,准确性会有实质性的损失。此外,基于高密度标记数据的 GS 可用于准确选择杂种表现,而无需系谱或品种信息。结果还表明,在杂种和混合群体中,与纯系相比,强连锁不平衡的单倍型片段更短,为 QTL 精细定位提供了机会。