Martini Johannes W R, Gao Ning, Cardoso Diercles F, Wimmer Valentin, Erbe Malena, Cantet Rodolfo J C, Simianer Henner
Department of Animal Sciences, Georg-August University, Albrecht Thaer-Weg 3, Göttingen, Germany.
National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China.
BMC Bioinformatics. 2017 Jan 3;18(1):3. doi: 10.1186/s12859-016-1439-1.
Epistasis marker effect models incorporating products of marker values as predictor variables in a linear regression approach (extended GBLUP, EGBLUP) have been assessed as potentially beneficial for genomic prediction, but their performance depends on marker coding. Although this fact has been recognized in literature, the nature of the problem has not been thoroughly investigated so far.
We illustrate how the choice of marker coding implicitly specifies the model of how effects of certain allele combinations at different loci contribute to the phenotype, and investigate coding-dependent properties of EGBLUP. Moreover, we discuss an alternative categorical epistasis model (CE) eliminating undesired properties of EGBLUP and show that the CE model can improve predictive ability. Finally, we demonstrate that the coding-dependent performance of EGBLUP offers the possibility to incorporate prior experimental information into the prediction method by adapting the coding to already available phenotypic records on other traits.
Based on our results, for EGBLUP, a symmetric coding {-1,1} or {-1,0,1} should be preferred, whereas a standardization using allele frequencies should be avoided. Moreover, CE can be a valuable alternative since it does not possess the undesired theoretical properties of EGBLUP. However, which model performs best will depend on characteristics of the data and available prior information. Data from previous experiments can for instance be incorporated into the marker coding of EGBLUP.
在基因组预测中,将标记值的乘积作为预测变量纳入线性回归方法的上位性标记效应模型(扩展GBLUP,EGBLUP)已被评估为可能有益,但它们的性能取决于标记编码。尽管这一事实在文献中已得到认可,但到目前为止,该问题的本质尚未得到彻底研究。
我们说明了标记编码的选择如何隐含地指定了不同位点上某些等位基因组合的效应如何影响表型的模型,并研究了EGBLUP的编码依赖性属性。此外,我们讨论了一种替代的分类上位性模型(CE),该模型消除了EGBLUP的不良属性,并表明CE模型可以提高预测能力。最后,我们证明了EGBLUP的编码依赖性性能提供了通过根据其他性状上已有的表型记录调整编码,将先验实验信息纳入预测方法的可能性。
根据我们的结果,对于EGBLUP,应首选对称编码{-1,1}或{-1,0,1},而应避免使用等位基因频率进行标准化。此外,CE可能是一种有价值的替代方案,因为它不具有EGBLUP的不良理论属性。然而,哪种模型表现最佳将取决于数据的特征和可用的先验信息。例如,以前实验的数据可以纳入EGBLUP的标记编码中。