Ou Zhining, Tempelman Robert J, Steibel Juan P, Ernst Catherine W, Bates Ronald O, Bello Nora M
Department of Statistics, Kansas State University, Manhattan, Kansas 66506.
Department of Animal Science, Michigan State University, East Lansing, Michigan 48824.
G3 (Bethesda). 2015 Nov 12;6(1):1-13. doi: 10.1534/g3.115.022897.
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.
利用单核苷酸多态性标记信息来预测动植物遗传价值的全基因组预测(WGP)模型通常假定残差方差是齐性的。然而,在不同的农业生产系统中,变异性往往是异质性的,这可能会使基于WGP的推断产生偏差。本研究在分层贝叶斯混合模型框架下,将基于正态性、重尾分布规范和变量选择的经典WGP模型进行扩展,以明确考虑环境驱动的残差异方差性。将假定残差方差为齐性或异质性的WGP模型应用于在反映异方差性增加梯度的模拟场景下生成的训练数据。模型拟合基于伪贝叶斯因子,同时也基于对从模拟训练数据集中剔除一代后的验证数据子集计算的基因组育种值的预测准确性。还将残差方差为齐性与异质性的WGP模型应用于分别针对高残差异方差性和低残差异方差性进行预筛选的猪资源群体数据集中记录的两个数量性状,即宰后45分钟胴体温度和腰大肌pH值。使用伪贝叶斯因子比较了相互竞争的WGP模型的拟合情况。还计算了预测能力,定义为五重交叉验证的验证集中预测表型与观察表型之间的相关性。与残差方差为齐性的WGP模型相比,残差方差为异质性的WGP模型显示出更好的模型拟合和更高的预测准确性,尽管改进幅度较小(预测准确性净增益小于两个百分点)。然而,考虑残差异方差性确实提高了选择的准确性,尤其是对于具有极端遗传价值的个体。