Rabier Charles-Elie, Barre Philippe, Asp Torben, Charmet Gilles, Mangin Brigitte
MIAT, Université de Toulouse, INRA, Castanet-Tolosan, France.
UR4, INRA, Unité de Recherche Pluridisciplinaire, Prairies et Plantes Fourragères, Lusignan, France.
PLoS One. 2016 Jun 20;11(6):e0156086. doi: 10.1371/journal.pone.0156086. eCollection 2016.
Genomic selection is focused on prediction of breeding values of selection candidates by means of high density of markers. It relies on the assumption that all quantitative trait loci (QTLs) tend to be in strong linkage disequilibrium (LD) with at least one marker. In this context, we present theoretical results regarding the accuracy of genomic selection, i.e., the correlation between predicted and true breeding values. Typically, for individuals (so-called test individuals), breeding values are predicted by means of markers, using marker effects estimated by fitting a ridge regression model to a set of training individuals. We present a theoretical expression for the accuracy; this expression is suitable for any configurations of LD between QTLs and markers. We also introduce a new accuracy proxy that is free of the QTL parameters and easily computable; it outperforms the proxies suggested in the literature, in particular, those based on an estimated effective number of independent loci (Me). The theoretical formula, the new proxy, and existing proxies were compared for simulated data, and the results point to the validity of our approach. The calculations were also illustrated on a new perennial ryegrass set (367 individuals) genotyped for 24,957 single nucleotide polymorphisms (SNPs). In this case, most of the proxies studied yielded similar results because of the lack of markers for coverage of the entire genome (2.7 Gb).
基因组选择聚焦于通过高密度标记预测选择候选个体的育种值。它基于这样一个假设:所有数量性状基因座(QTL)往往与至少一个标记处于强连锁不平衡(LD)状态。在此背景下,我们给出了关于基因组选择准确性的理论结果,即预测育种值与真实育种值之间的相关性。通常,对于个体(所谓的测试个体),通过标记来预测育种值,使用通过对一组训练个体拟合岭回归模型估计的标记效应。我们给出了准确性的理论表达式;该表达式适用于QTL与标记之间LD的任何配置。我们还引入了一种新的准确性替代指标,它不涉及QTL参数且易于计算;它优于文献中提出的替代指标,特别是那些基于估计的独立基因座有效数量(Me)的指标。对模拟数据比较了理论公式、新替代指标和现有替代指标,结果表明我们方法的有效性。还以一个新的多年生黑麦草数据集(367个个体)为例进行了计算说明,该数据集针对24957个单核苷酸多态性(SNP)进行了基因分型。在这种情况下,由于缺乏覆盖整个基因组(2.7 Gb)的标记,大多数研究的替代指标产生了相似的结果。