Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20892, USA.
Biostatistics. 2011 Jul;12(3):582-93. doi: 10.1093/biostatistics/kxq078. Epub 2011 Jan 5.
The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100-500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10( - 6)). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.
基于重采样的检验方法通常依赖于置换或自举程序,当检验统计量的渐近分布不可用或不可靠时,它已被广泛用于统计假设检验。它需要在大量模拟数据集上重复计算检验统计量,以评估其显著性水平,因此它可能会变得非常计算密集。在这里,我们提出了一种通过自适应随机逼近马尔可夫链蒙特卡罗算法来评估 p 值的有效方法。新方法可用于估计任何基于重采样的检验的 p 值。我们通过数值模拟表明,当评估小 p 值(例如小于 10(-6))的检验统计量时,新方法的计算效率(以计算时间衡量)可以比标准基于重采样的方法高 100-500 万倍。通过该方法降低计算负担,基于重采样的多功能检验将在更广泛的应用中变得计算可行。我们通过将其应用于前列腺癌的大规模遗传关联研究来演示该新方法的应用。