Department of Informatics, University of Oslo, Oslo, Norway.
Bioinformatics. 2011 Dec 1;27(23):3235-41. doi: 10.1093/bioinformatics/btr568. Epub 2011 Oct 13.
In molecular biology, as in many other scientific fields, the scale of analyses is ever increasing. Often, complex Monte Carlo simulation is required, sometimes within a large-scale multiple testing setting. The resulting computational costs may be prohibitively high.
We here present MCFDR, a simple, novel algorithm for false discovery rate (FDR) modulated sequential Monte Carlo (MC) multiple hypothesis testing. The algorithm iterates between adding MC samples across tests and calculating intermediate FDR values for the collection of tests. MC sampling is stopped either by sequential MC or based on a threshold on FDR. An essential property of the algorithm is that it limits the total number of MC samples whatever the number of true null hypotheses. We show on both real and simulated data that the proposed algorithm provides large gains in computational efficiency.
MCFDR is implemented in the Genomic HyperBrowser (http://hyperbrowser.uio.no/mcfdr), a web-based system for genome analysis. All input data and results are available and can be reproduced through a Galaxy Pages document at: http://hyperbrowser.uio.no/mcfdr/u/sandve/p/mcfdr.
在分子生物学中,与许多其他科学领域一样,分析的规模一直在不断扩大。通常需要进行复杂的蒙特卡罗模拟,有时还需要在大规模的多重检验环境中进行。由此产生的计算成本可能高得令人望而却步。
我们在这里提出 MCFDR,这是一种用于错误发现率(FDR)调制的序贯蒙特卡罗(MC)多重假设检验的简单新颖算法。该算法在跨测试添加 MC 样本和计算测试集合的中间 FDR 值之间迭代。MC 采样要么通过序贯 MC 停止,要么基于 FDR 的阈值停止。该算法的一个重要特性是,无论真实零假设的数量如何,它都限制了 MC 样本的总数。我们在真实和模拟数据上都表明,所提出的算法在计算效率方面有很大的提高。
MCFDR 是在基于网络的基因组分析系统 Genomic HyperBrowser(http://hyperbrowser.uio.no/mcfdr)中实现的。所有输入数据和结果都可用,并可通过 Galaxy Pages 文档在以下网址重现:http://hyperbrowser.uio.no/mcfdr/u/sandve/p/mcfdr。