正则化分位数回归应用于基于基因组的数量性状预测。

Regularized quantile regression applied to genome-enabled prediction of quantitative traits.

作者信息

Nascimento M, E Silva F F, de Resende M D V, Cruz C D, Nascimento A C C, Viana J M S, Azevedo C F, Barroso L M A

机构信息

Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, MG, Brasil

Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, MG, Brasil.

出版信息

Genet Mol Res. 2017 Mar 22;16(1):gmr-16-01-gmr.16019538. doi: 10.4238/gmr16019538.

DOI:10.4238/gmr16019538

PMID:28340274

Abstract

Genomic selection (GS) is a variant of marker-assisted selection, in which genetic markers covering the whole genome predict individual genetic merits for breeding. GS increases the accuracy of breeding values (BV) prediction. Although a variety of statistical models have been proposed to estimate BV in GS, few methodologies have examined statistical challenges based on non-normal phenotypic distributions, e.g., skewed distributions. Traditional GS models estimate changes in the phenotype distribution mean, i.e., the function is defined for the expected value of trait-conditional on markers, E(Y|X). We proposed an approach based on regularized quantile regression (RQR) for GS to improve the estimation of marker effects and the consequent genomic estimated BV (GEBV). The RQR model is based on conditional quantiles, Q(Y|X), enabling models that fit all portions of a trait probability distribution. This allows RQR to choose one quantile function that "best" represents the relationship between the dependent and independent variables. Data were simulated for 1000 individuals. The genome included 1500 markers; most had a small effect and only a few markers with a sizable effect were simulated. We evaluated three scenarios according to symmetrical, positively, and negatively skewed distributions. Analyses were performed using Bayesian LASSO (BLASSO) and RQR considering three quantiles (0.25, 0.50, and 0.75). The use of RQR to estimate GEBV was efficient; the RQR method achieved better results than BLASSO, at least for one quantile model fit for all evaluated scenarios. The gains in relation to BLASSO were 86.28 and 55.70% for positively and negatively skewed distributions, respectively.

摘要

基因组选择（GS）是标记辅助选择的一种变体，其中覆盖整个基因组的遗传标记可预测个体的育种遗传价值。GS提高了育种值（BV）预测的准确性。尽管已经提出了多种统计模型来估计GS中的BV，但很少有方法研究基于非正态表型分布（如偏态分布）的统计挑战。传统的GS模型估计表型分布均值的变化，即该函数是针对基于标记的性状条件期望值E(Y|X)定义的。我们提出了一种基于正则化分位数回归（RQR）的GS方法，以改进标记效应的估计以及由此产生的基因组估计育种值（GEBV）。RQR模型基于条件分位数Q(Y|X)，能够构建适合性状概率分布所有部分的模型。这使得RQR能够选择一个“最佳”表示因变量和自变量之间关系的分位数函数。对1000个个体的数据进行了模拟。基因组包含1500个标记；大多数标记效应较小，仅模拟了少数几个具有较大效应的标记。我们根据对称、正偏态和负偏态分布评估了三种情况。使用贝叶斯LASSO（BLASSO）和RQR并考虑三个分位数（0.25、0.50和0.75）进行分析。使用RQR估计GEBV是有效的；RQR方法至少在适合所有评估情况的一个分位数模型中比BLASSO取得了更好的结果。对于正偏态和负偏态分布，相对于BLASSO的增益分别为86.28%和55.70%。