Bermingham M L, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, Campbell H, Wright A F, Wilson J F, Agakov F, Navarro P, Haley C S
MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh.
The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh.
Sci Rep. 2015 May 19;5:10312. doi: 10.1038/srep10312.
In this study, we investigated the effect of five feature selection approaches on the performance of a mixed model (G-BLUP) and a Bayesian (Bayes C) prediction method. We predicted height, high density lipoprotein cholesterol (HDL) and body mass index (BMI) within 2,186 Croatian and into 810 UK individuals using genome-wide SNP data. Using all SNP information Bayes C and G-BLUP had similar predictive performance across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the UK data. Bayes C outperformed G-BLUP in the prediction of HDL, which is influenced by loci of moderate size, in the UK data. Supervised feature selection of a SNP subset in the G-BLUP framework provided a flexible, generalisable and computationally efficient alternative to Bayes C; but careful evaluation of predictive performance is required when supervised feature selection has been used.
在本研究中,我们调查了五种特征选择方法对混合模型(G-BLUP)和贝叶斯(Bayes C)预测方法性能的影响。我们使用全基因组SNP数据,对2186名克罗地亚人和810名英国个体的身高、高密度脂蛋白胆固醇(HDL)和体重指数(BMI)进行了预测。在克罗地亚数据中,使用所有SNP信息时,Bayes C和G-BLUP在所有性状上具有相似的预测性能;在对英国数据进行预测时,对于高度多基因性状身高和BMI也是如此。在英国数据中,对于受中等大小基因座影响的HDL,Bayes C在预测方面优于G-BLUP。在G-BLUP框架中对SNP子集进行监督特征选择,为Bayes C提供了一种灵活、可推广且计算高效的替代方法;但在使用监督特征选择时,需要仔细评估预测性能。