Department of Epidemiology and Health Statistics, School of Public Health, Shandong University, Jinan 250012, China.
BMC Genet. 2011 Aug 26;12:75. doi: 10.1186/1471-2156-12-75.
In genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity.
Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT.
KPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.
在遗传关联研究中,特别是在全基因组关联研究(GWAS)中,基于基因或区域的方法已经越来越受欢迎,用于检测多个 SNP 与疾病(或特征)之间的关联。核主成分分析结合逻辑回归检验(KPCA-LRT)已成功应用于基因表达数据分类。然而,关联研究的目的是检测遗传变异与疾病之间的相关性,而不是对样本进行分类,并且基因组数据是分类的而不是数值的。最近,尽管已经提出了基于核的关联研究中的逻辑回归模型,即将非线性原始 SNP 数据投影到线性特征空间中,但它仍然受到投影之间的多重共线性的影响,这可能导致功效损失。因此,我们提出了一种 KPCA-LRT 模型来避免多重共线性。
模拟结果表明,在不同的样本量、不同的显著水平和不同的相对风险下,KPCA-LRT 始终比主成分分析结合逻辑回归检验(PCA-LRT)更有效,特别是在全基因水平(1E-5)和较低的相对风险(RR=1.2,1.3)下。对来自遗传分析研讨会 16(GAW16)的四个类风湿关节炎(RA)基因区域的数据的应用表明,KPCA-LRT 比单基因检验和 PCA-LRT 具有更好的性能。
KPCA-LRT 是一种有效的、强大的基于基因或区域的 GWAS 数据分析方法,特别是在较低的相对风险和较低的显著水平下。