Suppr
超能文献

针对数百万个相关标记物进行快速准确的多重检验校正和效能估计。

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

作者信息

Han Buhm, Kang Hyun Min, Eskin Eleazar

机构信息

Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

出版信息

PLoS Genet. 2009 Apr;5(4):e1000456. doi: 10.1371/journal.pgen.1000456. Epub 2009 Apr 17.

DOI:10.1371/journal.pgen.1000456

PMID:19381255

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2663787/

Abstract

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies--SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

摘要

随着高通量测序和基因分型技术的发展，基因关联研究中收集的标记数量迅速增加，这使得多重假设检验校正方法的重要性日益凸显。置换检验被广泛认为是准确进行多重检验校正的金标准，但对于这些大型数据集而言，其计算量通常过大。最近，一些研究提出了基于多元正态分布（MVN）的高效替代置换检验的方法。然而，由于两个原因，它们无法在全基因组关联研究中准确校正多重检验。首先，这些方法需要将基因组划分为许多不相交的区块，并忽略来自不同区块的标记之间的所有相关性。其次，检验统计量的真实零分布在分布尾部往往不遵循渐近分布。我们提出了一种用于全基因组关联研究多重检验校正的准确且高效的方法——SLIDE。我们的方法考虑了滑动窗口内的所有相关性，并校正了统计量真实零分布与渐近分布的偏差。在使用威康信托病例对照协会数据进行的模拟中，SLIDE校正后的p值的错误率比之前基于MVN的方法校正后的p值的错误率小20多倍，而SLIDE比置换检验和其他竞争方法快几个数量级。我们还将MVN框架扩展到估计具有相关标记的关联研究的统计功效的问题，并提出了一种高效且准确的功效估计方法SLIP。SLIP和SLIDE可在http://slide.cs.ucla.edu获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1de/2663787/cc335cd3622c/pgen.1000456.g001.jpg

相似文献

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

PLoS Genet. 2009 Apr;5(4):e1000456. doi: 10.1371/journal.pgen.1000456. Epub 2009 Apr 17.

PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing.

Bioinformatics. 2010 Sep 1;26(17):2093-100. doi: 10.1093/bioinformatics/btq399. Epub 2010 Jul 6.

PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies.

BMC Bioinformatics. 2008 Jul 13;9:309. doi: 10.1186/1471-2105-9-309.

Multiple testing correction in linear mixed models.

Genome Biol. 2016 Apr 1;17:62. doi: 10.1186/s13059-016-0903-6.

RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests.

Genetics. 2017 Dec;207(4):1275-1283. doi: 10.1534/genetics.117.300395. Epub 2017 Oct 12.

Rapid and robust resampling-based multiple-testing correction with application in a genome-wide expression quantitative trait loci study.

Genetics. 2012 Apr;190(4):1511-20. doi: 10.1534/genetics.111.137737. Epub 2012 Jan 31.

An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research.

Stat Methods Med Res. 2018 Mar;27(3):905-919. doi: 10.1177/0962280216647422. Epub 2016 May 22.

A simple Bayesian mixture model with a hybrid procedure for genome-wide association studies.

Eur J Hum Genet. 2010 Aug;18(8):942-7. doi: 10.1038/ejhg.2010.51. Epub 2010 Apr 21.

A screening-testing approach for detecting gene-environment interactions using sequential penalized and unpenalized multiple logistic regression.

Pac Symp Biocomput. 2015:183-94.

Multiple testing corrections for imputed SNPs.

Genet Epidemiol. 2011 Apr;35(3):154-8. doi: 10.1002/gepi.20563. Epub 2011 Jan 19.

引用本文的文献

Genome-wide association studies and candidate genes networks affecting reproductive traits using Iranian Holstein sequence data.

BMC Genomics. 2025 Jul 11;26(1):656. doi: 10.1186/s12864-025-11744-1.

Neural Excitatory/Inhibitory Imbalance in Motor Aging: From Genetic Mechanisms to Therapeutic Challenges.

Biology (Basel). 2025 Mar 7;14(3):272. doi: 10.3390/biology14030272.

Exploring genetic variants affecting milk production traits through genome-wide association study in Vrindavani crossbred cattle of India.

Trop Anim Health Prod. 2025 Mar 6;57(2):104. doi: 10.1007/s11250-025-04348-0.

Hepatic WDR23 proteostasis mediates insulin homeostasis by regulating insulin-degrading enzyme capacity.

Geroscience. 2024 Oct;46(5):4461-4478. doi: 10.1007/s11357-024-01196-y. Epub 2024 May 20.

Identification of genetic variants affecting reproduction traits in Vrindavani cattle.

Mamm Genome. 2024 Mar;35(1):99-111. doi: 10.1007/s00335-023-10023-2. Epub 2023 Nov 4.

A COMPARISON OF PRINCIPAL COMPONENT METHODS BETWEEN MULTIPLE PHENOTYPE REGRESSION AND MULTIPLE SNP REGRESSION IN GENETIC ASSOCIATION STUDIES.

Ann Appl Stat. 2020 Mar;14(1):433-451. doi: 10.1214/19-aoas1312. Epub 2020 Apr 16.

Genotype-environment associations to reveal the molecular basis of environmental adaptation.

Plant Cell. 2023 Jan 2;35(1):125-138. doi: 10.1093/plcell/koac267.

Fine-mapping from summary data with the "Sum of Single Effects" model.

PLoS Genet. 2022 Jul 19;18(7):e1010299. doi: 10.1371/journal.pgen.1010299. eCollection 2022 Jul.

Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications to Whole Genome Association Studies.

J Am Stat Assoc. 2022;117(538):823-834. doi: 10.1080/01621459.2020.1822849. Epub 2020 Nov 12.

Genetic characterization of outbred Sprague Dawley rats and utility for genome-wide association studies.

PLoS Genet. 2022 May 31;18(5):e1010234. doi: 10.1371/journal.pgen.1010234. eCollection 2022 May.

本文引用的文献

Efficient association study design via power-optimized tag SNP selection.

Ann Hum Genet. 2008 Nov;72(Pt 6):834-47. doi: 10.1111/j.1469-1809.2008.00469.x. Epub 2008 Aug 13.

PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies.

BMC Bioinformatics. 2008 Jul 13;9:309. doi: 10.1186/1471-2105-9-309.

Estimating coverage and power for genetic association studies using near-complete variation data.

Nat Genet. 2008 Jul;40(7):841-3. doi: 10.1038/ng.180. Epub 2008 Jun 22.

On multiple-testing correction in genome-wide association studies.

Genet Epidemiol. 2008 Sep;32(6):567-73. doi: 10.1002/gepi.20331.

Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information.

Genome Res. 2008 Apr;18(4):653-60. doi: 10.1101/gr.072785.107. Epub 2008 Mar 18.

Estimation of the multiple testing burden for genomewide association studies of nearly all common variants.

Genet Epidemiol. 2008 May;32(4):381-5. doi: 10.1002/gepi.20303.

Estimation of significance thresholds for genomewide association scans.

Genet Epidemiol. 2008 Apr;32(3):227-34. doi: 10.1002/gepi.20297.

Am J Hum Genet. 2007 Dec;81(6):1158-68. doi: 10.1086/522036.

A second generation human haplotype map of over 3.1 million SNPs.

Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Am J Hum Genet. 2007 Nov;81(5):1084-97. doi: 10.1086/521987. Epub 2007 Sep 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

针对数百万个相关标记物进行快速准确的多重检验校正和效能估计。

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译