在全基因组关联研究中建立调整后的p值阈值以控制全家族I型错误。

Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies.

作者信息

Duggal Priya, Gillanders Elizabeth M, Holmes Taura N, Bailey-Wilson Joan E

机构信息

Statistical Genetics Section, Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD USA.

出版信息

BMC Genomics. 2008 Oct 31;9:516. doi: 10.1186/1471-2164-9-516.

DOI:10.1186/1471-2164-9-516

PMID:18976480

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2621212/

Abstract

BACKGROUND

By assaying hundreds of thousands of single nucleotide polymorphisms, genome wide association studies (GWAS) allow for a powerful, unbiased review of the entire genome to localize common genetic variants that influence health and disease. Although it is widely recognized that some correction for multiple testing is necessary, in order to control the family-wide Type 1 Error in genetic association studies, it is not clear which method to utilize. One simple approach is to perform a Bonferroni correction using all n single nucleotide polymorphisms (SNPs) across the genome; however this approach is highly conservative and would "overcorrect" for SNPs that are not truly independent. Many SNPs fall within regions of strong linkage disequilibrium (LD) ("blocks") and should not be considered "independent".

RESULTS

We proposed to approximate the number of "independent" SNPs by counting 1 SNP per LD block, plus all SNPs outside of blocks (interblock SNPs). We examined the effective number of independent SNPs for Genome Wide Association Study (GWAS) panels. In the CEPH Utah (CEU) population, by considering the interdependence of SNPs, we could reduce the total number of effective tests within the Affymetrix and Illumina SNP panels from 500,000 and 317,000 to 67,000 and 82,000 "independent" SNPs, respectively. For the Affymetrix 500 K and Illumina 317 K GWAS SNP panels we recommend using 10(-5), 10(-7) and 10(-8) and for the Phase II HapMap CEPH Utah and Yoruba populations we recommend using 10(-6), 10(-7) and 10(-9) as "suggestive", "significant" and "highly significant" p-value thresholds to properly control the family-wide Type 1 error.

CONCLUSION

By approximating the effective number of independent SNPs across the genome we are able to 'correct' for a more accurate number of tests and therefore develop 'LD adjusted' Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on well-utilized commercially available SNP "chips". These thresholds will serve as guides to researchers trying to decide which regions of the genome should be studied further.

摘要

背景

通过检测数十万单核苷酸多态性，全基因组关联研究（GWAS）能够对整个基因组进行强大且无偏倚的审查，以定位影响健康和疾病的常见遗传变异。尽管人们普遍认识到在遗传关联研究中需要对多重检验进行某种校正，以控制全家族的I型错误，但尚不清楚应采用哪种方法。一种简单的方法是使用全基因组中的所有n个单核苷酸多态性（SNP）进行Bonferroni校正；然而，这种方法非常保守，会对并非真正独立的SNP进行“过度校正”。许多SNP位于强连锁不平衡（LD）区域（“块”）内，不应被视为“独立”。

结果

我们建议通过计算每个LD块中的1个SNP，再加上块外的所有SNP（块间SNP）来近似“独立”SNP的数量。我们研究了全基因组关联研究（GWAS）面板中独立SNP的有效数量。在CEPH犹他州（CEU）人群中，通过考虑SNP的相互依赖性，我们可以将Affymetrix和Illumina SNP面板内的有效检验总数分别从500,000和317,000减少到67,000和82,000个“独立”SNP。对于Affymetrix 500K和Illumina 317K GWAS SNP面板，我们建议使用10^(-5)、10^(-7)和10^(-8)，对于II期HapMap CEPH犹他州和约鲁巴人群，我们建议使用10^(-6)、10^(-7)和10^(-9)作为“提示性”、“显著性”和“高度显著性”p值阈值，以适当控制全家族的I型错误。

结论

通过近似全基因组中独立SNP的有效数量，我们能够对更准确的检验数量进行“校正”，从而开发出“LD调整”的Bonferroni校正p值阈值，该阈值考虑了SNP在广泛使用的市售SNP“芯片”上的相互依赖性。这些阈值将为试图决定基因组哪些区域应进一步研究的研究人员提供指导。

相似文献

Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies.在全基因组关联研究中建立调整后的p值阈值以控制全家族I型错误。

BMC Genomics. 2008 Oct 31;9:516. doi: 10.1186/1471-2164-9-516.

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.评估商业基因分型阵列和公共 imputation 参考数据集的独立测试有效数量和显著 p 值阈值。

Hum Genet. 2012 May;131(5):747-56. doi: 10.1007/s00439-011-1118-2. Epub 2011 Dec 6.

A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families.一种在核心家庭中纳入最优P值阈值算法的复杂疾病多单核苷酸多态性关联测试。

BMC Genomics. 2015 May 15;16(1):381. doi: 10.1186/s12864-015-1620-3.

Variation of gene-based SNPs and linkage disequilibrium patterns in the human genome.人类基因组中基于基因的单核苷酸多态性变异及连锁不平衡模式

Hum Mol Genet. 2004 Aug 1;13(15):1623-32. doi: 10.1093/hmg/ddh177. Epub 2004 Jun 9.

Assessment and implications of linkage disequilibrium in genome-wide single-nucleotide polymorphism and microsatellite panels.全基因组单核苷酸多态性和微卫星面板中连锁不平衡的评估及意义

Genet Epidemiol. 2005;29 Suppl 1:S72-6. doi: 10.1002/gepi.20112.

Hidden Markov models for controlling false discovery rate in genome-wide association analysis.用于全基因组关联分析中控制错误发现率的隐马尔可夫模型

Methods Mol Biol. 2012;802:337-44. doi: 10.1007/978-1-61779-400-1_22.

A hidden Markov random field model for genome-wide association studies.基于隐马尔可夫随机场模型的全基因组关联研究。

Biostatistics. 2010 Jan;11(1):139-50. doi: 10.1093/biostatistics/kxp043. Epub 2009 Oct 12.

Performance of a blockwise approach in variable selection using linkage disequilibrium information.使用连锁不平衡信息进行变量选择时的分块方法性能。

BMC Bioinformatics. 2015 May 8;16:148. doi: 10.1186/s12859-015-0556-6.

Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms.通过收集额外的单核苷酸多态性来提高全基因组关联研究的效力。

Genetics. 2011 Jun;188(2):449-60. doi: 10.1534/genetics.111.128595. Epub 2011 Apr 5.

Genome-wide selection of tag SNPs using multiple-marker correlation.使用多标记相关性进行全基因组标签单核苷酸多态性选择。

Bioinformatics. 2007 Dec 1;23(23):3178-84. doi: 10.1093/bioinformatics/btm496. Epub 2007 Nov 15.

引用本文的文献

Genes associated with genetic and rare lung diseases and the risk of lung cancer.与遗传性和罕见肺部疾病以及肺癌风险相关的基因。

Res Sq. 2025 Aug 11:rs.3.rs-7029929. doi: 10.21203/rs.3.rs-7029929/v1.

Dynamics of mitochondrial DNA copy number regulation in relation to gastric cancer survival.与胃癌生存率相关的线粒体DNA拷贝数调控动态

Discov Oncol. 2025 Jun 13;16(1):1090. doi: 10.1007/s12672-025-02825-4.

Clinical and Genetic Factors Associated with Intraoperative Minimum Alveolar Concentration Ratio: A Single-center Retrospective Cohort and Genome-wide Association Study.与术中最低肺泡浓度比值相关的临床和遗传因素：一项单中心回顾性队列研究和全基因组关联研究

Anesthesiology. 2025 Jul 21. doi: 10.1097/ALN.0000000000005602.

Genome-Wide Association Analysis of Sweet Pepper () Based on Agronomic Traits Using PepperSNP50K.基于农艺性状利用辣椒SNP50K芯片对甜椒进行全基因组关联分析

Plants (Basel). 2025 May 17;14(10):1506. doi: 10.3390/plants14101506.

Genome-wide association study and transcriptomic analysis reveal the crucial role of in resistance to visceral white-nodules disease in .全基因组关联研究和转录组分析揭示了[具体内容]在[具体对象]对内脏白色结节病抗性中的关键作用。

Front Immunol. 2025 Apr 28;16:1562307. doi: 10.3389/fimmu.2025.1562307. eCollection 2025.

GWAS significance thresholds in large cohorts of European ancestry.欧洲血统大型队列中的全基因组关联研究显著性阈值。

Genetics. 2025 May 8;230(1). doi: 10.1093/genetics/iyaf056.

A special short-wing petal faba genome and genetic dissection of floral and yield-related traits accelerate breeding and improvement of faba bean.一个特殊的短翼花瓣蚕豆基因组以及对花和产量相关性状的遗传剖析加速了蚕豆的育种与改良。

Genome Biol. 2025 Mar 17;26(1):62. doi: 10.1186/s13059-025-03532-7.

Molecular analysis of acute pyelonephritis-excessive innate and attenuated adaptive immunity.急性肾盂肾炎的分子分析——先天性免疫过度与适应性免疫减弱

Life Sci Alliance. 2024 Dec 20;8(3). doi: 10.26508/lsa.202402926. Print 2025 Mar.

Genetic variants associated with sepsis-associated acute kidney injury.与脓毒症相关的急性肾损伤相关的基因变异

PLoS One. 2024 Dec 5;19(12):e0311318. doi: 10.1371/journal.pone.0311318. eCollection 2024.

The Genetic Characteristics of FT-MIRS-Predicted Milk Fatty Acids in Chinese Holstein Cows.中国荷斯坦奶牛中FT-MIRS预测的乳脂肪酸的遗传特征

Animals (Basel). 2024 Oct 8;14(19):2901. doi: 10.3390/ani14192901.

本文引用的文献

Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21.全基因组关联扫描确定了11q23上的一个结直肠癌易感位点，并在8q24和18q21上重复了风险位点。

Nat Genet. 2008 May;40(5):631-7. doi: 10.1038/ng.133. Epub 2008 Mar 30.

A second generation human haplotype map of over 3.1 million SNPs.一张包含超过310万个单核苷酸多态性的第二代人类单倍型图谱。

Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.

A new multipoint method for genome-wide association studies by imputation of genotypes.一种通过基因型插补进行全基因组关联研究的新的多点方法。

Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。

Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.

Replicating genotype-phenotype associations.复制基因型-表型关联。

Nature. 2007 Jun 7;447(7145):655-60. doi: 10.1038/447655a.

A note on permutation tests in multistage association scans.关于多阶段关联扫描中置换检验的一则注释。

Am J Hum Genet. 2006 Jun;78(6):1094-5; author reply 1096. doi: 10.1086/504527.

Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms.基于主成分分析和单倍型分组算法在大规模单核苷酸多态性研究中多重检验校正的Ⅰ类错误比较。

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S78. doi: 10.1186/1471-2156-6-S1-S78.

A haplotype map of the human genome.人类基因组单倍型图谱。

Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.

Quantitative trait Loci analysis using the false discovery rate.使用错误发现率的数量性状基因座分析。

Genetics. 2005 Oct;171(2):783-90. doi: 10.1534/genetics.104.036699. Epub 2005 Jun 14.

Statistical significance for genomewide studies.全基因组研究的统计学显著性

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验