Biotechnology and Food Research, MTT Agrifood Research Finland, FI-31600 Jokioinen, Finland.
Genet Sel Evol. 2011 Jun 26;43(1):25. doi: 10.1186/1297-9686-43-25.
Genomic data are used in animal breeding to assist genetic evaluation. Several models to estimate genomic breeding values have been studied. In general, two approaches have been used. One approach estimates the marker effects first and then, genomic breeding values are obtained by summing marker effects. In the second approach, genomic breeding values are estimated directly using an equivalent model with a genomic relationship matrix. Allele coding is the method chosen to assign values to the regression coefficients in the statistical model. A common allele coding is zero for the homozygous genotype of the first allele, one for the heterozygote, and two for the homozygous genotype for the other allele. Another common allele coding changes these regression coefficients by subtracting a value from each marker such that the mean of regression coefficients is zero within each marker. We call this centered allele coding. This study considered effects of different allele coding methods on inference. Both marker-based and equivalent models were considered, and restricted maximum likelihood and Bayesian methods were used in inference.
Theoretical derivations showed that parameter estimates and estimated marker effects in marker-based models are the same irrespective of the allele coding, provided that the model has a fixed general mean. For the equivalent models, the same results hold, even though different allele coding methods lead to different genomic relationship matrices. Calculated genomic breeding values are independent of allele coding when the estimate of the general mean is included into the values. Reliabilities of estimated genomic breeding values calculated using elements of the inverse of the coefficient matrix depend on the allele coding because different allele coding methods imply different models. Finally, allele coding affects the mixing of Markov chain Monte Carlo algorithms, with the centered coding being the best.
Different allele coding methods lead to the same inference in the marker-based and equivalent models when a fixed general mean is included in the model. However, reliabilities of genomic breeding values are affected by the allele coding method used. The centered coding has some numerical advantages when Markov chain Monte Carlo methods are used.
基因组数据用于动物育种以辅助遗传评估。已经研究了几种估计基因组育种值的模型。一般来说,使用了两种方法。一种方法是首先估计标记效应,然后通过对标记效应求和获得基因组育种值。在第二种方法中,使用具有基因组关系矩阵的等效模型直接估计基因组育种值。等位基因编码是为统计模型中的回归系数赋值所选择的方法。一种常见的等位基因编码是,第一个等位基因的纯合基因型为零,杂合子为一,另一个等位基因的纯合基因型为二。另一种常见的等位基因编码是通过从每个标记中减去一个值来改变这些回归系数,使得每个标记内回归系数的均值为零。我们称这种为中心化等位基因编码。本研究考虑了不同等位基因编码方法对推断的影响。同时考虑了基于标记的模型和等效模型,并在推断中使用了限制最大似然法和贝叶斯方法。
理论推导表明,只要模型具有固定的总体均值,基于标记的模型中的参数估计和估计的标记效应与等位基因编码无关。对于等效模型,即使不同的等位基因编码方法导致不同的基因组关系矩阵,结果也是相同的。当将总体均值的估计值包含在基因组育种值中时,计算出的基因组育种值与等位基因编码无关。使用系数矩阵逆矩阵的元素计算的估计基因组育种值的可靠性取决于等位基因编码,因为不同的等位基因编码方法意味着不同的模型。最后,等位基因编码会影响马尔可夫链蒙特卡罗算法的混合,其中中心化编码是最好的。
当模型中包含固定的总体均值时,不同的等位基因编码方法在基于标记的模型和等效模型中会导致相同的推断。然而,基因组育种值的可靠性受所用等位基因编码方法的影响。当使用马尔可夫链蒙特卡罗方法时,中心化编码具有一些数值优势。