Duggal Priya, Gillanders Elizabeth M, Holmes Taura N, Bailey-Wilson Joan E
Statistical Genetics Section, Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD USA.
BMC Genomics. 2008 Oct 31;9:516. doi: 10.1186/1471-2164-9-516.
By assaying hundreds of thousands of single nucleotide polymorphisms, genome wide association studies (GWAS) allow for a powerful, unbiased review of the entire genome to localize common genetic variants that influence health and disease. Although it is widely recognized that some correction for multiple testing is necessary, in order to control the family-wide Type 1 Error in genetic association studies, it is not clear which method to utilize. One simple approach is to perform a Bonferroni correction using all n single nucleotide polymorphisms (SNPs) across the genome; however this approach is highly conservative and would "overcorrect" for SNPs that are not truly independent. Many SNPs fall within regions of strong linkage disequilibrium (LD) ("blocks") and should not be considered "independent".
We proposed to approximate the number of "independent" SNPs by counting 1 SNP per LD block, plus all SNPs outside of blocks (interblock SNPs). We examined the effective number of independent SNPs for Genome Wide Association Study (GWAS) panels. In the CEPH Utah (CEU) population, by considering the interdependence of SNPs, we could reduce the total number of effective tests within the Affymetrix and Illumina SNP panels from 500,000 and 317,000 to 67,000 and 82,000 "independent" SNPs, respectively. For the Affymetrix 500 K and Illumina 317 K GWAS SNP panels we recommend using 10(-5), 10(-7) and 10(-8) and for the Phase II HapMap CEPH Utah and Yoruba populations we recommend using 10(-6), 10(-7) and 10(-9) as "suggestive", "significant" and "highly significant" p-value thresholds to properly control the family-wide Type 1 error.
By approximating the effective number of independent SNPs across the genome we are able to 'correct' for a more accurate number of tests and therefore develop 'LD adjusted' Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on well-utilized commercially available SNP "chips". These thresholds will serve as guides to researchers trying to decide which regions of the genome should be studied further.
通过检测数十万单核苷酸多态性,全基因组关联研究(GWAS)能够对整个基因组进行强大且无偏倚的审查,以定位影响健康和疾病的常见遗传变异。尽管人们普遍认识到在遗传关联研究中需要对多重检验进行某种校正,以控制全家族的I型错误,但尚不清楚应采用哪种方法。一种简单的方法是使用全基因组中的所有n个单核苷酸多态性(SNP)进行Bonferroni校正;然而,这种方法非常保守,会对并非真正独立的SNP进行“过度校正”。许多SNP位于强连锁不平衡(LD)区域(“块”)内,不应被视为“独立”。
我们建议通过计算每个LD块中的1个SNP,再加上块外的所有SNP(块间SNP)来近似“独立”SNP的数量。我们研究了全基因组关联研究(GWAS)面板中独立SNP的有效数量。在CEPH犹他州(CEU)人群中,通过考虑SNP的相互依赖性,我们可以将Affymetrix和Illumina SNP面板内的有效检验总数分别从500,000和317,000减少到67,000和82,000个“独立”SNP。对于Affymetrix 500K和Illumina 317K GWAS SNP面板,我们建议使用10^(-5)、10^(-7)和10^(-8),对于II期HapMap CEPH犹他州和约鲁巴人群,我们建议使用10^(-6)、10^(-7)和10^(-9)作为“提示性”、“显著性”和“高度显著性”p值阈值,以适当控制全家族的I型错误。
通过近似全基因组中独立SNP的有效数量,我们能够对更准确的检验数量进行“校正”,从而开发出“LD调整”的Bonferroni校正p值阈值,该阈值考虑了SNP在广泛使用的市售SNP“芯片”上的相互依赖性。这些阈值将为试图决定基因组哪些区域应进一步研究的研究人员提供指导。