Zhong Shengqiang, Dekkers Jack C M, Fernando Rohan L, Jannink Jean-Luc
Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA.
Genetics. 2009 May;182(1):355-64. doi: 10.1534/genetics.108.098277. Epub 2009 Mar 18.
We compared the accuracies of four genomic-selection prediction methods as affected by marker density, level of linkage disequilibrium (LD), quantitative trait locus (QTL) number, sample size, and level of replication in populations generated from multiple inbred lines. Marker data on 42 two-row spring barley inbred lines were used to simulate high and low LD populations from multiple inbred line crosses: the first included many small full-sib families and the second was derived from five generations of random mating. True breeding values (TBV) were simulated on the basis of 20 or 80 additive QTL. Methods used to derive genomic estimated breeding values (GEBV) were random regression best linear unbiased prediction (RR-BLUP), Bayes-B, a Bayesian shrinkage regression method, and BLUP from a mixed model analysis using a relationship matrix calculated from marker data. Using the best methods, accuracies of GEBV were comparable to accuracies from phenotype for predicting TBV without requiring the time and expense of field evaluation. We identified a trade-off between a method's ability to capture marker-QTL LD vs. marker-based relatedness of individuals. The Bayesian shrinkage regression method primarily captured LD, the BLUP methods captured relationships, while Bayes-B captured both. Under most of the study scenarios, mixed-model analysis using a marker-derived relationship matrix (BLUP) was more accurate than methods that directly estimated marker effects, suggesting that relationship information was more valuable than LD information. When markers were in strong LD with large-effect QTL, or when predictions were made on individuals several generations removed from the training data set, however, the ranking of method performance was reversed and BLUP had the lowest accuracy.
我们比较了四种基因组选择预测方法的准确性,这些方法受标记密度、连锁不平衡(LD)水平、数量性状位点(QTL)数量、样本量以及由多个近交系产生的群体中的重复水平影响。利用42个两行春大麦近交系的标记数据,模拟了多个近交系杂交产生的高LD群体和低LD群体:第一个群体包含许多小的全同胞家系,第二个群体来自五代随机交配。基于20个或80个加性QTL模拟了真实育种值(TBV)。用于推导基因组估计育种值(GEBV)的方法有随机回归最佳线性无偏预测(RR-BLUP)、贝叶斯B方法(一种贝叶斯收缩回归方法)以及使用从标记数据计算出的亲缘关系矩阵进行混合模型分析的BLUP方法。使用最佳方法时,GEBV的准确性与基于表型预测TBV的准确性相当,且无需进行田间评估的时间和费用。我们发现了一种权衡,即方法捕捉标记-QTL LD的能力与基于标记的个体亲缘关系之间的权衡。贝叶斯收缩回归方法主要捕捉LD,BLUP方法捕捉亲缘关系,而贝叶斯B方法两者都能捕捉。在大多数研究场景下,使用基于标记的亲缘关系矩阵的混合模型分析(BLUP)比直接估计标记效应的方法更准确,这表明亲缘关系信息比LD信息更有价值。然而,当标记与大效应QTL处于强LD状态时,或者当对与训练数据集相隔几代的个体进行预测时,方法性能的排名会颠倒,BLUP的准确性最低。