Department of Informatics, Ludwig-Maximilians Universität, Amalienstrasse 17, Munich, Germany.
Bioinformatics. 2012 Jun 1;28(11):1480-6. doi: 10.1093/bioinformatics/bts164. Epub 2012 Apr 5.
Several statistical tests are available to detect the enrichment of differential expression in gene sets. Such tests were originally proposed for analyzing gene sets associated with biological processes. The objective evaluation of tests on real measurements has not been possible as it is difficult to decide a priori, which processes will be affected in given experiments.
We present a first large study to rigorously assess and compare the performance of gene set enrichment tests on real expression measurements. Gene sets are defined based on the targets of given regulators such as transcription factors (TFs) and microRNAs (miRNAs). In contrast to processes, TFs and miRNAs are amenable to direct perturbations, e.g. regulator over-expression or deletion. We assess the ability of 14 different statistical tests to predict the perturbations from expression measurements in Escherichia coli, Saccharomyces cerevisiae and human. We also analyze how performance depends on the quality and comprehensiveness of the regulator targets via a permutation approach. We find that ANOVA and Wilcoxons test consistently perform better than for instance Kolmogorov-Smirnov and hypergeometric tests. For scenarios where the optimal test is not known, we suggest to combine all evaluated tests into an unweighted consensus, which also performs well in our assessment. Our results provide a guide for the selection of existing tests as well as a basis for the development and assessment of novel tests.
有几种统计检验可用于检测基因集差异表达的富集。这些检验最初是为分析与生物过程相关的基因集而提出的。由于很难事先确定哪些过程会在给定的实验中受到影响,因此对真实测量数据的检验进行客观评估是不可能的。
我们进行了首次大规模研究,以严格评估和比较基因集富集检验在真实表达测量上的性能。基因集是基于给定调节剂(如转录因子 (TFs) 和 microRNAs (miRNAs))的靶标定义的。与过程不同,TFs 和 miRNAs 可以直接受到扰动,例如调节剂的过表达或缺失。我们评估了 14 种不同的统计检验在预测大肠杆菌、酿酒酵母和人类表达测量中的扰动的能力。我们还通过置换方法分析了性能如何取决于调节剂靶标的质量和全面性。我们发现 ANOVA 和 Wilcoxons 检验始终比例如 Kolmogorov-Smirnov 和超几何检验表现更好。对于不知道最佳检验的情况,我们建议将所有评估的检验组合成一个无权重的共识,该共识在我们的评估中也表现良好。我们的结果为现有检验的选择提供了指导,并为新检验的开发和评估提供了基础。