Suppr超能文献

评估商业基因分型阵列和公共 imputation 参考数据集的独立测试有效数量和显著 p 值阈值。

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.

机构信息

Department of Psychiatry, The University of Hong Kong, Pokfulam, Hong Kong.

出版信息

Hum Genet. 2012 May;131(5):747-56. doi: 10.1007/s00439-011-1118-2. Epub 2011 Dec 6.

Abstract

Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M(e)) for the adjustment of multiple testing, but current methods of calculation for M(e) are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M(e). Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M(e), and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10(-7) as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10(-8) for current or merged commercial genotyping arrays, ~10(-8) for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10(-8) for the common SNPs only within genes.

摘要

目前的全基因组关联研究(GWAS)使用商业基因分型微阵列,可以检测超过一百万的单核苷酸多态性(SNP)。通过先进的统计基因型推断算法和大型 SNP 数据库,用于参考人类群体,SNP 的数量进一步增加。在全基因组研究中,需要考虑到大量 SNP 的测试,这需要考虑到 SNP 之间由于连锁不平衡(LD)而导致的非独立性。以前有几个小组提出使用有效独立标记数(M(e))来调整多重检验,但目前计算 M(e)的方法在准确性或计算速度上都受到限制。在这里,我们报告了一种更稳健和快速的计算 M(e)的方法。应用这种高效的方法[在一个名为 Genetic type 1 error calculator (GEC)的免费软件工具中实现],我们系统地检查了 M(e),以及为了控制全基因组的 1%错误率为 0.05,需要的相应 p 值阈值,对于 13 个 Illumina 或 Affymetrix 基因分型阵列,以及作为广泛用于基因型推断的参考面板的 HapMap 项目和 1000 基因组项目数据集。我们的结果表明,对于早期商业基因分型阵列,使用 p 值阈值约为 10(-7)作为全基因组显著性的标准,但对于当前或合并的商业基因分型阵列,需要更严格的 p 值阈值约为 5×10(-8),对于 1000 基因组项目数据集中的所有常见 SNP,需要更严格的 p 值阈值约为 5×10(-8),而对于仅在基因内的常见 SNP,则需要更严格的 p 值阈值约为 5×10(-8)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a83/3325408/ad70da580e6a/439_2011_1118_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验