Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.
PLoS One. 2013 May 3;8(5):e62495. doi: 10.1371/journal.pone.0062495. Print 2013.
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.
全基因组关联研究(GWAS)是一种很有前途的方法,可以根据数百万个单核苷酸多态性(SNP)来识别疾病的常见遗传变异。为了避免在单基因座关联研究中由于过多校正多重比较而导致的低功效,已经提出了一些基于基因组特征将 SNP 组合成 SNP 集的方法,然后检验 SNP 集的联合效应。我们比较了主成分分析(PCA)、有监督主成分分析(SPCA)、核主成分分析(KPCA)和切片逆回归(SIR)的性能。在 0、1 和≥2 个因果 SNP 模型的情况下生成模拟 SNP 集。我们的模拟结果表明,所有这些方法都可以在名义显著水平上控制 I 型错误。在不同的连锁不平衡结构和模拟数据集的次要等位基因频率设置下,SPCA 始终比其他方法更有效。我们还将这四种方法应用于汉族人群的非小细胞肺癌(NSCLC)的真实 GWAS 中。