Scheid Stefanie, Spang Rainer
Max Planck Institute for Molecular Genetics, Computational Diagnostics, Berlin, Germany.
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jul-Sep;1(3):98-108. doi: 10.1109/TCBB.2004.24.
Screening for differential gene expression in microarray studies leads to difficult large-scale multiple testing problems. The local false discovery rate is a statistical concept for quantifying uncertainty in multiple testing. In this paper, we introduce a novel estimator for the local false discovery rate that is based on an algorithm which splits all genes into two groups, representing induced and noninduced genes, respectively. Starting from the full set of genes, we successively exclude genes until the gene-wise p-values of the remaining genes look like a typical sample from a uniform distribution. In comparison to other methods, our algorithm performs compatibly in detecting the shape of the local false discovery rate and has a smaller bias with respect to estimating the overall percentage of noninduced genes. Our algorithm is implemented in the Bioconductor compatible R package TWILIGHT version 1.0.1, which is available from http://compdiag.molgen.mpg.de/software or from the Bioconductor project at http://www.bioconductor.org.
在微阵列研究中筛选差异基因表达会导致困难的大规模多重检验问题。局部错误发现率是用于量化多重检验中不确定性的一个统计概念。在本文中,我们介绍了一种基于一种算法的局部错误发现率的新型估计器,该算法将所有基因分为两组,分别代表诱导基因和非诱导基因。从全套基因开始,我们依次排除基因,直到其余基因的基因水平p值看起来像来自均匀分布的典型样本。与其他方法相比,我们的算法在检测局部错误发现率的形状方面表现相当,并且在估计非诱导基因的总体百分比方面具有较小的偏差。我们的算法在与Bioconductor兼容的R包TWILIGHT版本1.0.1中实现,可从http://compdiag.molgen.mpg.de/software或Bioconductor项目的http://www.bioconductor.org获取。