Scutari Marco, Mackay Ian, Balding David
Genetics Institute, University College London (UCL), London, UK.
Stat Appl Genet Mol Biol. 2013 Aug;12(4):517-27. doi: 10.1515/sagmb-2013-0002.
We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets with continuous phenotypic traits from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.
我们研究了两种提高全基因组标记表型预测效率的方法,这是动植物育种中基因组选择(GS)的关键步骤。第一种方法是基于马尔可夫毯的特征选择,它为识别非信息性标记提供了一个理论上合理的框架。仅使用信息性标记拟合GS模型会得到更简单的模型,这可能会因减少基因分型而节省成本。我们表明,对于四种GS模型:偏最小二乘法(PLS)、岭回归、套索回归和弹性网络,这样做在预测能力上不会有损失,甚至可能有小幅提高。第二种方法是为基因组最佳线性无偏预测(GBLUP)选择亲缘系数。我们比较了基于标记基因型的中心化和标准化的不同组合的亲缘关系,以及一种新提出的针对连锁不平衡(LD)进行调整的亲缘关系度量。我们说明了这两种方法的使用,并使用来自植物和动物遗传学的三个具有连续表型性状的真实数据集检验了它们的性能。我们发现,采用特征选择的弹性网络和使用经LD调整的亲缘关系的GBLUP表现同样出色,是我们研究中表现最佳的方法。