Fernando Rohan L, Dekkers Jack Cm, Garrick Dorian J
Department of Animal Science, Iowa State University, 50011 Ames, Iowa, USA.
Genet Sel Evol. 2014 Sep 22;46(1):50. doi: 10.1186/1297-9686-46-50.
To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase.
A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores.
In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications.
为了获得不受选择偏差影响的预测结果,必须根据用于选择的数据计算育种值的条件均值。当单核苷酸多态性(SNP)效应呈正态分布时,可以认为单步最佳线性无偏预测(SS-BLUP)能得出育种值的条件均值。然而,要获得SS-BLUP,需要计算基因组关系的密集矩阵G的逆矩阵,随着基因分型动物数量的增加,这将变得不可行。此外,计算G需要奠基者中SNP等位基因的频率,而在大多数情况下这些频率是不可用的。此外,随着标记密度的增加,相对于诸如BayesB和BayesC等变量选择模型,预计SS-BLUP的表现会较差。
提出了一种用于贝叶斯回归模型(SSBR)的策略,该策略像在SS-BLUP中一样,结合了来自基因分型和非基因分型动物的所有可用数据,但适用于更广泛的模型类别。我们的策略对未进行基因分型的动物使用插补标记协变量,并结合适当的残差遗传效应来适应真实基因型和插补基因型之间的偏差。在正态性条件下,SSBR的一种形式产生的结果与SS-BLUP相同,但不需要计算G或其逆矩阵,并且能提供更丰富的推断。目前,贝叶斯回归分析用于几千个基因分型个体。然而,当将SSBR应用于育种计划中的所有动物时,动物数量将增加100到200倍,计算时间也会相应增加100到200倍。可以使用并行计算策略来减少计算时间。在一种这样的策略中,使用120个核心实现了58倍的加速。
在SSBR和SS-BLUP中,表型、基因型和系谱信息在单步中结合。与SS-BLUP不同,SSBR不限于正态分布的标记效应;当标记效应具有t分布(如在BayesA中)或混合分布(如在BayesB或BayesCπ中)时也可以使用。此外它具有不需要矩阵求逆的优点。我们研究了并行计算以加速SSBR分析,使其可用于常规应用。