Zhang Yu, Liu Jun S
Department of Statistics, The Pennsylvania State University, 422A Thomas Building, University Park, PA 16803.
J Am Stat Assoc. 2011 Sep 1;106(495):846-857. doi: 10.1198/jasa.2011.ap10657.
Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online.
全基因组关联研究通常涉及对数百万个单核苷酸多态性(SNP)进行疾病关联的同时检测。然而,由于连锁不平衡(LD,一种表示相关性的遗传学术语),附近基因组区域的SNP往往高度相关。因此,用于多重比较的简单Bonferonni校正过于保守。实际中常用的置换检验对于全基因组研究来说计算成本高昂且范围有限。我们提出了一种基于泊松去簇启发式算法的准确且计算高效的方法,用于近似SNP关联的全基因组显著性。与置换检验和其他多重比较调整方法相比,我们的方法能在数秒内为数百万个相关比较计算出最准确、最稳健的p值调整。我们通过分析证明,我们方法的准确性和效率几乎与样本量、SNP数量以及要调整的p值规模无关。此外,我们的方法可以很容易地用于估计错误发现率。当应用于全基因组SNP数据集时,我们观察到从不同基因组区域评估得到的p值调整结果差异很大。然而这种沿基因组的调整差异在欧洲人和非洲人群之间具有良好的一致性。p值调整与SNP之间的LD、重组率和SNP密度显著相关。鉴于基因组中序列特征的巨大变异性,我们进一步讨论了一种使用SNP特异性(局部)阈值来检测全基因组显著关联的新方法。本文有在线补充材料。