Wang Yuanjia, Sha Nanshi, Fang Yixin
Department of Biostatistics, School of Public Health, Columbia University, 722 West 168th Street, New York, NY 10032, USA.
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S16. doi: 10.1186/1753-6561-3-s7-s16.
Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L1 and L2 penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.
单基因座分析常用于分析全基因组关联(GWA)数据,但这种分析需要进行严格的多重比较校正。有人提出使用多变量逻辑回归为病例对照数据拟合多基因座模型。然而,当样本量远小于单核苷酸多态性(SNP)数量或SNP之间的相关性很高时,传统的多变量逻辑回归就会失效。为了适应GWA数据的规模,同时在高维预测变量空间中控制共线性和过拟合,我们提出了一种使用贝叶斯逻辑回归的变量选择方法。我们探讨了具有特定先验的贝叶斯回归与L1和L2惩罚逻辑回归之间的联系。在贝叶斯回归中同时分析大量SNP后,我们选择重要的SNP以供进一步考虑。由于感兴趣的SNP数量少得多,多重比较和共线性问题就不那么严重了。我们进行了模拟研究,以检验正确选择疾病相关SNP的概率,并应用所开发的方法分析遗传分析研讨会16北美类风湿关节炎协会的数据。