Gianola Daniel, Cecchinato Alessio, Naya Hugo, Schön Chris-Carolin
Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, United States.
Department of Dairy Science, University of Wisconsin-Madison, Madison, WI, United States.
Front Genet. 2018 Jun 5;9:195. doi: 10.3389/fgene.2018.00195. eCollection 2018.
A widely used method for prediction of complex traits in animal and plant breeding is "genomic best linear unbiased prediction" (GBLUP). In a quantitative genetics setting, BLUP is a linear regression of phenotypes on a pedigree or on a genomic relationship matrix, depending on the type of input information available. Normality of the distributions of random effects and of model residuals is not required for BLUP but a Gaussian assumption is made implicitly. A potential downside is that Gaussian linear regressions are sensitive to outliers, genetic or environmental in origin. We present simple (relative to a fully Bayesian analysis) to implement robust alternatives to BLUP using a linear model with residual or Laplace distributions instead of a Gaussian one, and evaluate the methods with milk yield records on Italian Brown Swiss cattle, grain yield data in inbred wheat lines, and using three traits measured on accessions of . The methods do not use Markov chain Monte Carlo sampling and model hyper-parameters, viewed here as regularization "knobs," are tuned via some cross-validation. Uncertainty of predictions are evaluated by employing bootstrapping or by random reconstruction of training and testing sets. It was found (e.g., test-day milk yield in cows, flowering time and FRIGIDA expression in ) that the best predictions were often those obtained with the robust methods. The results obtained are encouraging and stimulate further investigation and generalization.
在动植物育种中,一种广泛使用的复杂性状预测方法是“基因组最佳线性无偏预测”(GBLUP)。在数量遗传学背景下,根据可用输入信息的类型,BLUP是表型对系谱或基因组关系矩阵的线性回归。BLUP不需要随机效应和模型残差分布的正态性,但隐含地做出了高斯假设。一个潜在的缺点是高斯线性回归对异常值敏感,这些异常值可能源于遗传或环境因素。我们提出了简单的(相对于完全贝叶斯分析)方法,使用具有残差或拉普拉斯分布而非高斯分布的线性模型来实现BLUP的稳健替代方法,并使用意大利褐牛的产奶量记录、近交小麦品系的谷物产量数据以及对……的种质所测量的三个性状来评估这些方法。这些方法不使用马尔可夫链蒙特卡罗抽样,并且将模型超参数(在此视为正则化“旋钮”)通过一些交叉验证进行调整。预测的不确定性通过自举法或通过随机重建训练集和测试集来评估。结果发现(例如,奶牛的测定日产奶量、……的开花时间和FRIGIDA表达),最佳预测往往是通过稳健方法获得的。所获得的结果令人鼓舞,并激发了进一步的研究和推广。