D'Angelo Gina M, Rao Dc, Gu C Charles
Division of Biostatistics, Washington University School of Medicine, 660 South Euclid Avenue, St, Louis, Missouri 63110, USA.
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S62. doi: 10.1186/1753-6561-3-s7-s62.
Variable selection in genome-wide association studies can be a daunting task and statistically challenging because there are more variables than subjects. We propose an approach that uses principal-component analysis (PCA) and least absolute shrinkage and selection operator (LASSO) to identify gene-gene interaction in genome-wide association studies. A PCA was used to first reduce the dimension of the single-nucleotide polymorphisms (SNPs) within each gene. The interaction of the gene PCA scores were placed into LASSO to determine whether any gene-gene signals exist. We have extended the PCA-LASSO approach using the bootstrap to estimate the standard errors and confidence intervals of the LASSO coefficient estimates. This method was compared to placing the raw SNP values into the LASSO and the logistic model with individual gene-gene interaction. We demonstrated these methods with the Genetic Analysis Workshop 16 rheumatoid arthritis genome-wide association study data and our results identified a few gene-gene signals. Based on our results, the PCA-LASSO method shows promise in identifying gene-gene interactions, and, at this time we suggest using it with other conventional approaches, such as generalized linear models, to narrow down genetic signals.
在全基因组关联研究中进行变量选择可能是一项艰巨的任务,并且在统计学上具有挑战性,因为变量比研究对象更多。我们提出了一种方法,该方法使用主成分分析(PCA)和最小绝对收缩与选择算子(LASSO)来识别全基因组关联研究中的基因-基因相互作用。首先使用PCA来降低每个基因内单核苷酸多态性(SNP)的维度。将基因PCA得分的相互作用纳入LASSO,以确定是否存在任何基因-基因信号。我们使用自助法扩展了PCA-LASSO方法,以估计LASSO系数估计值的标准误差和置信区间。将该方法与将原始SNP值纳入LASSO以及具有个体基因-基因相互作用的逻辑模型进行了比较。我们用遗传分析研讨会16类风湿性关节炎全基因组关联研究数据展示了这些方法,我们的结果识别出了一些基因-基因信号。基于我们的结果,PCA-LASSO方法在识别基因-基因相互作用方面显示出前景,并且此时我们建议将其与其他传统方法(如广义线性模型)一起使用,以缩小遗传信号范围。