Computational & Mathematical Biology, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore.
Biol Direct. 2011 May 20;6:27. doi: 10.1186/1745-6150-6-27.
False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity.
We propose a new extrapolative method called Constrained Regression Recalibration (ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (π0) and FDR are estimated after the recalibration.
ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.
错误发现率(FDR)控制通常被认为是多重假设检验问题中最合适的误差控制方法。FDR 的估计准确性取决于每个检验的 p 值的估计准确性以及分布的基本假设的有效性。然而,在许多实际的检验问题中,例如在基因组学中,由于许多已知或未知的原因,p 值可能被低估或高估。因此,FDR 的估计会受到影响,失去其真实性。
我们提出了一种新的外推方法,称为约束回归再校准(ConReg-R),通过对其分布进行建模来重新校准经验 p 值,从而改善 FDR 估计。我们的 ConReg-R 方法基于以下观察结果:来自真实零假设的准确估计的 p 值遵循均匀分布,并且观察到的 p 值分布实际上是来自真实零假设和真实替代假设的 p 值分布的混合。因此,ConReg-R 重新校准观察到的 p 值,以使它们表现出理想的经验 p 值分布的特性。重新校准后,估计了真实零假设的比例(π0)和 FDR。
ConReg-R 提供了一种有效改善 FDR 估计的方法。它只需要检验的 p 值,并且避免了原始检验数据的置换。我们证明,该方法在从微阵列和 RNA-seq 实验获得的几个基因表达数据集中显著提高了 FDR 估计。