Suppr超能文献

使用基于不同加权因子构建的基因组关系矩阵来考虑位点特异性方差的基因组预测比较。

Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances.

作者信息

Su G, Christensen O F, Janss L, Lund M S

机构信息

Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark.

Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark.

出版信息

J Dairy Sci. 2014 Oct;97(10):6547-59. doi: 10.3168/jds.2014-8210. Epub 2014 Aug 14.

Abstract

Various models have been used for genomic prediction. Bayesian variable selection models often predict more accurate genomic breeding values than genomic BLUP (GBLUP), but GBLUP is generally preferred for routine genomic evaluations because of low computational demand. The objective of this study was to achieve the benefits of both models using results from Bayesian models and genome-wide association studies as weights on single nucleotide polymorphism (SNP) markers when constructing the genomic matrix (G-matrix) for genomic prediction. The data comprised 5,221 progeny-tested bulls from the Nordic Holstein population. The animals were genotyped using the Illumina Bovine SNP50 BeadChip (Illumina Inc., San Diego, CA). Weighting factors in this investigation were the posterior SNP variance, the square of the posterior SNP effect, and the corresponding minus base-10 logarithm of the marker association P-value [-log10(P)] of a t-test obtained from the analysis using a Bayesian mixture model with 4 normal distributions, the square of the estimated SNP effect, and the corresponding -log10(P) of a t-test obtained from the analysis using a classical genome-wide association study model (linear regression model). The weights were derived from the analysis based on data sets that were 0, 1, 3, or 5 yr before performing genomic prediction. In building a G-matrix, the weights were assigned either to each marker (single-marker weighting) or to each group of approximately 5 to 150 markers (group-marker weighting). The analysis was carried out for milk yield, fat yield, protein yield, fertility, and mastitis. Deregressed proofs (DRP) were used as response variables to predict genomic estimated breeding values (GEBV). Averaging over the 5 traits, the Bayesian model led to 2.0% higher reliability of GEBV than the GBLUP model with an original unweighted G-matrix. The superiority of using a GBLUP with weighted G-matrix over GBLUP with an original unweighted G-matrix was the largest when using a weighting factor of posterior variance, resulting in 1.7 percentage points higher reliability. The second best weighting factors were -log10 (P-value) of a t-test corresponding to the square of the posterior SNP effect from the Bayesian model and -log10 (P-value) of a t-test corresponding to the square of the estimated SNP effect from the linear regression model, followed by the square of estimated SNP effect and the square of the posterior SNP effect. In addition, group-marker weighting performed better than single-marker weighting in terms of reducing bias of GEBV, and also slightly increased prediction reliability. The differences between weighting factors and scenarios were larger in prediction bias than in prediction accuracy. Finally, weights derived from a data set having a lag up to 3 yr did not reduce reliability of GEBV. The results indicate that posterior SNP variance estimated from a Bayesian mixture model is a good alternative weighting factor, and common weights on group markers with a size of 30 markers is a good strategy when using markers of the 50,000-marker (50K) chip. In a population with gradually increasing reference data, the weights can be updated once every 3 yr.

摘要

已经使用了各种模型进行基因组预测。贝叶斯变量选择模型通常比基因组最佳线性无偏预测(GBLUP)能更准确地预测基因组育种值,但由于计算需求低,GBLUP通常更适合常规基因组评估。本研究的目的是在构建用于基因组预测的基因组矩阵(G矩阵)时,利用贝叶斯模型和全基因组关联研究的结果作为单核苷酸多态性(SNP)标记的权重,从而兼具两种模型的优点。数据包括来自北欧荷斯坦种群的5221头经过后裔测定的公牛。使用Illumina Bovine SNP50 BeadChip(Illumina公司,加利福尼亚州圣地亚哥)对这些动物进行基因分型。本研究中的加权因子包括后验SNP方差、后验SNP效应的平方,以及使用具有4个正态分布的贝叶斯混合模型分析得到的t检验的标记关联P值的负以10为底的对数[-log10(P)]、估计SNP效应的平方,以及使用经典全基因组关联研究模型(线性回归模型)分析得到的t检验的相应-log10(P)。这些权重是基于在进行基因组预测前0、1、3或5年的数据集分析得出的。在构建G矩阵时,权重被分配给每个标记(单标记加权)或每组大约5至150个标记(组标记加权)。对产奶量、产脂量、产蛋白量、繁殖力和乳腺炎进行了分析。使用去回归证明(DRP)作为响应变量来预测基因组估计育种值(GEBV)。在5个性状上进行平均,与具有原始未加权G矩阵的GBLUP模型相比,贝叶斯模型使GEBV的可靠性提高了2.0%。当使用后验方差作为加权因子时,使用加权G矩阵的GBLUP相对于具有原始未加权G矩阵的GBLUP的优势最大,可靠性提高了1.7个百分点。第二好的加权因子是与贝叶斯模型后验SNP效应平方对应的t检验的-log10(P值)以及与线性回归模型估计SNP效应平方对应的t检验的-log10(P值),其次是估计SNP效应的平方和后验SNP效应的平方。此外,在降低GEBV偏差方面,组标记加权比单标记加权表现更好,并且还略微提高了预测可靠性。加权因子和方案之间在预测偏差上的差异比在预测准确性上的差异更大。最后,从滞后时间长达3年的数据集得出的权重并没有降低GEBV的可靠性。结果表明,从贝叶斯混合模型估计的后验SNP方差是一个很好的替代加权因子,并且当使用50000标记(50K)芯片的标记时,对大小为30个标记的组标记使用通用权重是一个很好的策略。在参考数据逐渐增加的群体中,权重可以每3年更新一次。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验