Calus Mario Pl, Schrooten Chris, Veerkamp Roel F
Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, P.O. Box 338, Wageningen, 6700 AH, The Netherlands.
CRV BV, Arnhem, 6800 AL, The Netherlands.
Genet Sel Evol. 2014 Sep 25;46(1):52. doi: 10.1186/s12711-014-0052-x.
Genomic prediction requires estimation of variances of effects of single nucleotide polymorphisms (SNPs), which is computationally demanding, and uses these variances for prediction. We have developed models with separate estimation of SNP variances, which can be applied infrequently, and genomic prediction, which can be applied routinely.
SNP variances were estimated with Bayes Stochastic Search Variable Selection (BSSVS) and BayesC. Genome-enhanced breeding values (GEBV) were estimated with RR-BLUP (ridge regression best linear unbiased prediction), using either variances obtained from BSSVS (BLUP-SSVS) or BayesC (BLUP-C), or assuming equal variances for each SNP. Datasets used to estimate SNP variances comprised (1) all animals, (2) 50% random animals (RAN50), (3) 50% best animals (TOP50), or (4) 50% worst animals (BOT50). Traits analysed were protein yield, udder depth, somatic cell score, interval between first and last insemination, direct longevity, and longevity including information from predictors.
BLUP-SSVS and BLUP-C yielded similar GEBV as the equivalent Bayesian models that simultaneously estimated SNP variances. Reliabilities of these GEBV were consistently higher than from RR-BLUP, although only significantly for direct longevity. Across scenarios that used data subsets to estimate GEBV, observed reliabilities were generally higher for TOP50 than for RAN50, and much higher than for BOT50. Reliabilities of TOP50 were higher because the training data contained more ancestors of selection candidates. Using estimated SNP variances based on random or non-random subsets of the data, while using all data to estimate GEBV, did not affect reliabilities of the BLUP models. A convergence criterion of 10(-8) instead of 10(-10) for BLUP models yielded similar GEBV, while the required number of iterations decreased by 71 to 90%. Including a separate polygenic effect consistently improved reliabilities of the GEBV, but also substantially increased the required number of iterations to reach convergence with RR-BLUP. SNP variances converged faster for BayesC than for BSSVS.
Combining Bayesian variable selection models to re-estimate SNP variances and BLUP models that use those SNP variances, yields GEBV that are similar to those from full Bayesian models. Moreover, these combined models yield predictions with higher reliability and less bias than the commonly used RR-BLUP model.
基因组预测需要估计单核苷酸多态性(SNP)效应的方差,这在计算上要求很高,并利用这些方差进行预测。我们开发了分别估计SNP方差(可较少应用)和基因组预测(可常规应用)的模型。
使用贝叶斯随机搜索变量选择(BSSVS)和贝叶斯C方法估计SNP方差。使用RR-BLUP(岭回归最佳线性无偏预测)估计基因组增强育种值(GEBV),使用从BSSVS(BLUP-SSVS)或贝叶斯C(BLUP-C)获得的方差,或假设每个SNP的方差相等。用于估计SNP方差的数据集包括:(1)所有动物;(2)50%的随机动物(RAN50);(3)50%的最佳动物(TOP50);或(4)50%的最差动物(BOT50)。分析的性状包括蛋白质产量、乳房深度、体细胞评分、首次和最后一次授精间隔、直接寿命以及包括预测指标信息的寿命。
BLUP-SSVS和BLUP-C产生的GEBV与同时估计SNP方差的等效贝叶斯模型相似。这些GEBV的可靠性始终高于RR-BLUP,不过仅在直接寿命方面显著更高。在使用数据子集估计GEBV的各种情况下,TOP50的观测可靠性通常高于RAN50,且远高于BOT50。TOP50的可靠性更高是因为训练数据包含更多选择候选个体的祖先。在使用所有数据估计GEBV时,基于数据的随机或非随机子集使用估计的SNP方差,并不影响BLUP模型的可靠性。对于BLUP模型,将收敛标准设为10^(-8)而非10^(-10)可产生相似的GEBV,同时所需的迭代次数减少71%至90%。纳入单独的多基因效应始终能提高GEBV的可靠性,但也大幅增加了使用RR-BLUP达到收敛所需的迭代次数。贝叶斯C的SNP方差收敛速度比BSSVS快。
结合贝叶斯变量选择模型重新估计SNP方差和使用这些SNP方差的BLUP模型,可产生与完全贝叶斯模型相似的GEBV。此外,这些组合模型产生的预测比常用的RR-BLUP模型具有更高的可靠性和更小的偏差。