Heskes Tom, Eisinga Rob, Breitling Rainer
Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.
Department of Social Science Research Methods, Radboud University Nijmegen, Nijmegen, The Netherlands.
BMC Bioinformatics. 2014 Nov 21;15(1):367. doi: 10.1186/s12859-014-0367-1.
The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution.
We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood.
We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .
秩乘积法是一种用于在重复实验中识别差异表达分子的强大统计技术。分子选择中的一个关键问题是准确计算秩乘积统计量的p值,以充分解决多重检验问题。已提出精确计算、置换和伽马近似来确定分子水平的显著性。这些当前方法存在严重缺陷,因为它们要么计算量很大,要么在p值分布的尾部提供不准确的估计。
我们推导出精确p值的严格上下界以及一种精确近似,可用于以计算快速的方式评估秩乘积统计量的显著性。在确定尾部概率方面,这些界和所提出的近似比现有的近似方法具有更高的准确性,稍微保守的上界可防止假阳性。我们在最近发表的一项血液转录组分析的背景下说明了所提出的方法。
我们提供了一种确定秩乘积统计量的上界和精确近似p值的方法。与当前方法相比,所提出的算法在通量上提高了一个数量级,并为探索具有更大多重检验问题的新应用领域提供了机会。R代码在一个附加文件中发布,可在http://www.ru.nl/publish/pages/726696/rankprodbounds.zip获取。