Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany.
Bioinformatics. 2011 May 15;27(10):1377-83. doi: 10.1093/bioinformatics/btr152. Epub 2011 Mar 26.
An important object in the analysis of high-throughput genomic data is to find an association between the expression profile of functional gene sets and the different levels of a group response. Instead of multiple testing procedures which focus on single genes, global tests are usually used to detect a group effect in an entire gene set. In a simulation study, we compare the power and computation times of four different approaches for global testing. The applicability of one of these methods to gene expression data is demonstrated for the first time. In addition, we propose an algorithm for the detection of those genes which might be responsible for a group effect.
We could detect that the power of three of the approaches is comparable in many settings but considerable differences were detected in the computation times. Our proposed gene selection algorithm was able to detect potentially effect-causing genes in artificial sets with high power when many genes were altered with a small effect, while classical multiple testing was more powerful when few genes were altered with a large effect.
An R-package called 'RepeatedHighDim' which implements our new global test procedures is made available from http://cran.r-project.org/.
在高通量基因组数据分析中,一个重要的目标是在功能基因集的表达谱与一个群体响应的不同水平之间找到关联。与专注于单个基因的多重检验程序不同,全局检验通常用于检测整个基因集中的群体效应。在一项模拟研究中,我们比较了四种不同全局检验方法的功效和计算时间。其中一种方法的适用性首次被证明适用于基因表达数据。此外,我们提出了一种用于检测可能导致群体效应的基因的算法。
我们可以发现,在许多情况下,三种方法的功效相当,但在计算时间上存在显著差异。我们提出的基因选择算法在许多基因发生小效应改变的情况下,能够在人工集合中以较高的功效检测到可能导致效应的基因,而在少数基因发生大效应改变的情况下,经典的多重检验则更具功效。
一个名为 'RepeatedHighDim' 的 R 包,其中实现了我们的新全局检验程序,可以从 http://cran.r-project.org/ 获得。