Suppr超能文献

针对数百万个相关标记物进行快速准确的多重检验校正和效能估计。

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

作者信息

Han Buhm, Kang Hyun Min, Eskin Eleazar

机构信息

Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

出版信息

PLoS Genet. 2009 Apr;5(4):e1000456. doi: 10.1371/journal.pgen.1000456. Epub 2009 Apr 17.

Abstract

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies--SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

摘要

随着高通量测序和基因分型技术的发展,基因关联研究中收集的标记数量迅速增加,这使得多重假设检验校正方法的重要性日益凸显。置换检验被广泛认为是准确进行多重检验校正的金标准,但对于这些大型数据集而言,其计算量通常过大。最近,一些研究提出了基于多元正态分布(MVN)的高效替代置换检验的方法。然而,由于两个原因,它们无法在全基因组关联研究中准确校正多重检验。首先,这些方法需要将基因组划分为许多不相交的区块,并忽略来自不同区块的标记之间的所有相关性。其次,检验统计量的真实零分布在分布尾部往往不遵循渐近分布。我们提出了一种用于全基因组关联研究多重检验校正的准确且高效的方法——SLIDE。我们的方法考虑了滑动窗口内的所有相关性,并校正了统计量真实零分布与渐近分布的偏差。在使用威康信托病例对照协会数据进行的模拟中,SLIDE校正后的p值的错误率比之前基于MVN的方法校正后的p值的错误率小20多倍,而SLIDE比置换检验和其他竞争方法快几个数量级。我们还将MVN框架扩展到估计具有相关标记的关联研究的统计功效的问题,并提出了一种高效且准确的功效估计方法SLIP。SLIP和SLIDE可在http://slide.cs.ucla.edu获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1de/2663787/cc335cd3622c/pgen.1000456.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验