Department of Computer Science, Wayne State University, Detroit, 48202, USA.
Department of Computer Science and Engineering, University of Nevada, Reno, 89557, USA.
Genome Biol. 2019 Oct 9;20(1):203. doi: 10.1186/s13059-019-1790-4.
Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true.
This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested.
Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.
许多高通量实验比较两种表型,如疾病与健康,目的是了解给定表型所具有的潜在生物学现象。由于这种分析的重要性,迄今为止已经提出了 70 多种通路分析方法。这些方法可以分为两大类:非拓扑结构(非 TB)和拓扑结构(TB)。虽然一些综述论文从不同角度讨论了这个话题,但没有对这些方法进行系统的、大规模的评估。此外,大多数通路分析方法都依赖于零假设下 p 值均匀分布的假设,而这通常是不正确的。
本文介绍了迄今为止最全面的通路分析方法比较研究。我们比较了 13 种广泛使用的通路分析方法在超过 1085 次分析中的实际性能。这些比较是使用来自 75 个人类疾病数据集的 2601 个样本和来自 11 个敲除小鼠数据集的 121 个样本进行的。此外,我们还研究了每种方法在零假设下的偏差程度。这些数据和结果一起构成了一个可靠的基准,未来的通路分析方法可以而且应该在此基础上进行测试。
总的来说,结果表明没有一种方法是完美的。一般来说,TB 方法似乎比非 TB 方法表现更好。这在某种程度上是可以预料的,因为 TB 方法考虑了通路的结构,而通路结构旨在描述潜在的现象。我们还发现,大多数(如果不是全部)列出的方法都是有偏差的,在零假设下可能会产生有偏差的结果。