Pérez-Rodríguez Paulino, Acosta-Pech Rocío, Pérez-Elizalde Sergio, Cruz Ciro Velasco, Espinosa Javier Suárez, Crossa José
Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México.
Colegio de Postgraduados, CP 56230, Montecillos, Edo. de México
G3 (Bethesda). 2018 May 4;8(5):1771-1785. doi: 10.1534/g3.117.300406.
Genomic selection (GS) has become a tool for selecting candidates in plant and animal breeding programs. In the case of quantitative traits, it is common to assume that the distribution of the response variable can be approximated by a normal distribution. However, it is known that the selection process leads to skewed distributions. There is vast statistical literature on skewed distributions, but the skew normal distribution is of particular interest in this research. This distribution includes a third parameter that drives the skewness, so that it generalizes the normal distribution. We propose an extension of the Bayesian whole-genome regression to skew normal distribution data in the context of GS applications, where usually the number of predictors vastly exceeds the sample size. However, it can also be applied when the number of predictors is smaller than the sample size. We used a stochastic representation of a skew normal random variable, which allows the implementation of standard Markov Chain Monte Carlo (MCMC) techniques to efficiently fit the proposed model. The predictive ability and goodness of fit of the proposed model were evaluated using simulated and real data, and the results were compared to those obtained by the Bayesian Ridge Regression model. Results indicate that the proposed model has a better fit and is as good as the conventional Bayesian Ridge Regression model for prediction, based on the DIC criterion and cross-validation, respectively. A computing program coded in the R statistical package and C programming language to fit the proposed model is available as supplementary material.
基因组选择(GS)已成为动植物育种计划中选择候选个体的一种工具。对于数量性状而言,通常假定响应变量的分布可以用正态分布来近似。然而,众所周知,选择过程会导致分布出现偏态。关于偏态分布有大量的统计文献,但在本研究中,偏态正态分布特别受关注。这种分布包含一个驱动偏度的第三个参数,从而对正态分布进行了推广。我们提出在GS应用的背景下,将贝叶斯全基因组回归扩展到偏态正态分布数据,在GS应用中通常预测变量的数量远远超过样本量。不过,当预测变量的数量小于样本量时也可以应用。我们使用了偏态正态随机变量的一种随机表示,这使得可以实施标准的马尔可夫链蒙特卡罗(MCMC)技术来有效地拟合所提出的模型。使用模拟数据和实际数据评估了所提出模型的预测能力和拟合优度,并将结果与贝叶斯岭回归模型得到的结果进行了比较。结果表明,基于DIC准则和交叉验证,所提出的模型分别具有更好的拟合度,并且在预测方面与传统的贝叶斯岭回归模型一样好。作为补充材料提供了一个用R统计软件包和C编程语言编写的用于拟合所提出模型的计算程序。