评估商业基因分型阵列和公共 imputation 参考数据集的独立测试有效数量和显著 p 值阈值。

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.

机构信息

Department of Psychiatry, The University of Hong Kong, Pokfulam, Hong Kong.

出版信息

Hum Genet. 2012 May;131(5):747-56. doi: 10.1007/s00439-011-1118-2. Epub 2011 Dec 6.

DOI:10.1007/s00439-011-1118-2

PMID:22143225

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3325408/

Abstract

Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M(e)) for the adjustment of multiple testing, but current methods of calculation for M(e) are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M(e). Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M(e), and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10(-7) as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10(-8) for current or merged commercial genotyping arrays, ~10(-8) for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10(-8) for the common SNPs only within genes.

摘要

目前的全基因组关联研究（GWAS）使用商业基因分型微阵列，可以检测超过一百万的单核苷酸多态性（SNP）。通过先进的统计基因型推断算法和大型 SNP 数据库，用于参考人类群体，SNP 的数量进一步增加。在全基因组研究中，需要考虑到大量 SNP 的测试，这需要考虑到 SNP 之间由于连锁不平衡（LD）而导致的非独立性。以前有几个小组提出使用有效独立标记数（M(e)）来调整多重检验，但目前计算 M(e)的方法在准确性或计算速度上都受到限制。在这里，我们报告了一种更稳健和快速的计算 M(e)的方法。应用这种高效的方法[在一个名为 Genetic type 1 error calculator (GEC)的免费软件工具中实现]，我们系统地检查了 M(e)，以及为了控制全基因组的 1%错误率为 0.05，需要的相应 p 值阈值，对于 13 个 Illumina 或 Affymetrix 基因分型阵列，以及作为广泛用于基因型推断的参考面板的 HapMap 项目和 1000 基因组项目数据集。我们的结果表明，对于早期商业基因分型阵列，使用 p 值阈值约为 10(-7)作为全基因组显著性的标准，但对于当前或合并的商业基因分型阵列，需要更严格的 p 值阈值约为 5×10(-8)，对于 1000 基因组项目数据集中的所有常见 SNP，需要更严格的 p 值阈值约为 5×10(-8)，而对于仅在基因内的常见 SNP，则需要更严格的 p 值阈值约为 5×10(-8)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a83/3325408/ad70da580e6a/439_2011_1118_Fig1_HTML.jpg

相似文献

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.

Hum Genet. 2012 May;131(5):747-56. doi: 10.1007/s00439-011-1118-2. Epub 2011 Dec 6.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Comprehensive evaluation of imputation performance in African Americans.

J Hum Genet. 2012 Jul;57(7):411-21. doi: 10.1038/jhg.2012.43. Epub 2012 May 31.

Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.

Hum Genet. 2013 May;132(5):509-22. doi: 10.1007/s00439-013-1266-7. Epub 2013 Jan 22.

Quick, "imputation-free" meta-analysis with proxy-SNPs.

BMC Bioinformatics. 2012 Sep 12;13:231. doi: 10.1186/1471-2105-13-231.

Effect of genome-wide genotyping and reference panels on rare variants imputation.

J Genet Genomics. 2012 Oct 20;39(10):545-50. doi: 10.1016/j.jgg.2012.07.002. Epub 2012 Jul 24.

Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies.

BMC Genomics. 2008 Oct 31;9:516. doi: 10.1186/1471-2164-9-516.

Using family-based imputation in genome-wide association studies with large complex pedigrees: the Framingham Heart Study.

PLoS One. 2012;7(12):e51589. doi: 10.1371/journal.pone.0051589. Epub 2012 Dec 17.

Imputation reliability on DNA biallelic markers for drug metabolism studies.

BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-13-S14-S7. Epub 2012 Sep 7.

Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects.

Genet Epidemiol. 2012 Jul;36(5):508-16. doi: 10.1002/gepi.21647. Epub 2012 May 29.

引用本文的文献

Identifying New Loci and Genes Associated with Feed Efficiency in Broilers.

Int J Mol Sci. 2025 Sep 1;26(17):8492. doi: 10.3390/ijms26178492.

Identification of new genomic loci for seed protein and oil content in the soybean pangenome using genome-wide association and haplotype analyses.

Theor Appl Genet. 2025 Sep 1;138(9):237. doi: 10.1007/s00122-025-05020-9.

Associations between the gut microbiota at one-year and neurodevelopment in children from the SEPAGES cohort.

Brain Behav Immun Health. 2025 Jul 18;48:101063. doi: 10.1016/j.bbih.2025.101063. eCollection 2025 Oct.

Exposure to per- and poly-fluoroalkyl substances in association to later occurrence of type 2 diabetes and metabolic pathway dysregulation in a multiethnic US population.

EBioMedicine. 2025 Aug;118:105838. doi: 10.1016/j.ebiom.2025.105838. Epub 2025 Jul 21.

Comprehensive analysis of 1,771 transcriptomes from 7 tissues enhance genetic and biological interpretations of maize complex traits.

G3 (Bethesda). 2025 Sep 3;15(9). doi: 10.1093/g3journal/jkaf140.

BMC Genomics. 2025 Jul 8;26(1):645. doi: 10.1186/s12864-025-11766-9.

The genetic architecture of temperature-induced partial fertility restoration in A cytoplasm in sorghum (Sorghum bicolor (L.) Moench).

Theor Appl Genet. 2025 Jul 2;138(7):170. doi: 10.1007/s00122-025-04946-4.

Genome-wide association analysis and gene mining of flavonoids in Xanthoceras sorbifolia.

Sci Rep. 2025 Jul 1;15(1):20808. doi: 10.1038/s41598-025-00514-4.

Identification of Advantaged Genes for Low-Nitrogen-Tolerance-Related Traits in Rice Using a Genome-Wide Association Study.

Int J Mol Sci. 2025 Jun 16;26(12):5749. doi: 10.3390/ijms26125749.

Reduced Cortical Surface Area in the Frontal Operculum as a Causal Risk Predictor for Chronic Pain.

Pain Res Manag. 2025 Jun 5;2025:4687197. doi: 10.1155/prm/4687197. eCollection 2025.

本文引用的文献

Linkage disequilibrium in finite populations.

Theor Appl Genet. 1968 Jun;38(6):226-31. doi: 10.1007/BF01245622.

GATES: a rapid and powerful gene-based association test using extended Simes procedure.

Am J Hum Genet. 2011 Mar 11;88(3):283-93. doi: 10.1016/j.ajhg.2011.01.019.

PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing.

Bioinformatics. 2010 Sep 1;26(17):2093-100. doi: 10.1093/bioinformatics/btq399. Epub 2010 Jul 6.

Association of JAG1 with bone mineral density and osteoporotic fractures: a genome-wide association study and follow-up replication studies.

Am J Hum Genet. 2010 Feb 12;86(2):229-39. doi: 10.1016/j.ajhg.2009.12.014. Epub 2010 Jan 21.

Sequencing technologies - the next generation.

Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.

A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

PLoS Genet. 2009 Jun;5(6):e1000529. doi: 10.1371/journal.pgen.1000529. Epub 2009 Jun 19.

Genotyping technologies for genetic research.

Annu Rev Genomics Hum Genet. 2009;10:117-33. doi: 10.1146/annurev-genom-082908-150116.

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

PLoS Genet. 2009 Apr;5(4):e1000456. doi: 10.1371/journal.pgen.1000456. Epub 2009 Apr 17.

A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests.

Genet Epidemiol. 2009 Nov;33(7):559-68. doi: 10.1002/gepi.20408.

Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies.

BMC Genomics. 2008 Oct 31;9:516. doi: 10.1186/1471-2164-9-516.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估商业基因分型阵列和公共 imputation 参考数据集的独立测试有效数量和显著 p 值阈值。

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献