Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
PLoS One. 2013 Sep 30;8(9):e75897. doi: 10.1371/journal.pone.0075897. eCollection 2013.
Genome-wide association studies (GWAS) are popular for identifying genetic variants which are associated with disease risk. Many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously which considering disadvantages of methods in single locus association analysis. Kernel machine based SNP set analysis is more powerful than single locus analysis, which borrows information from SNPs correlated with causal or tag SNPs. Four types of kernel machine functions and principal component based approach (PCA) were also compared. However, given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and used a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in the case of different minor allele frequencies (MAF) and linkage disequilibrium (LD) structures. We also applied the three methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. Simulation results show that when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and other kernel machine functions at different LD structures and different numbers of causal SNPs. Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS have better performance than the linear kernel, IBS kernel and PCA.
全基因组关联研究(GWAS)是一种识别与疾病风险相关的遗传变异的常用方法。已经提出了许多方法来同时检测区域内的多个单核苷酸多态性(SNP),这些方法考虑了单点关联分析方法的缺点。基于核机器的 SNP 集分析比单点分析更强大,它从与因果或标记 SNP 相关的 SNP 中借用信息。还比较了四种核机器函数和基于主成分的方法(PCA)。然而,由于次要等位基因频率(MAF)较低导致的功效损失,我们对 PCA 进行了扩展工作,并使用了一种称为加权 PCA(wPCA)的新方法。在不同的次要等位基因频率(MAF)和连锁不平衡(LD)结构下,对加权主成分分析(wPCA)、基于逻辑核机器的检验(LKM)和基于 SNP 集的主成分分析(PCA)进行了比较分析。我们还将这三种方法应用于分析从汉族人群非小细胞肺癌的真实 GWAS 数据集中提取的两个 SNP 集。模拟结果表明,当因果 SNP 的 MAF 较低时,加权主成分和加权 IBS 在不同的 LD 结构和不同数量的因果 SNP 下比 PCA 和其他核机器函数更有效。将这三种方法应用于真实的 GWAS 数据集表明,wPCA 和 wIBS 的性能优于线性核、IBS 核和 PCA。