Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA.
Bioinformatics. 2012 Dec 1;28(23):3073-80. doi: 10.1093/bioinformatics/bts579. Epub 2012 Oct 7.
The analysis of differentially expressed gene sets became a routine in the analyses of gene expression data. There is a multitude of tests available, ranging from aggregation tests that summarize gene-level statistics for a gene set to true multivariate tests, accounting for intergene correlations. Most of them detect complex departures from the null hypothesis but when the null hypothesis is rejected, the specific alternative leading to the rejection is not easily identifiable.
In this article we compare the power and Type I error rates of minimum-spanning tree (MST)-based non-parametric multivariate tests with several multivariate and aggregation tests, which are frequently used for pathway analyses. In our simulation study, we demonstrate that MST-based tests have power that is for many settings comparable with the power of conventional approaches, but outperform them in specific regions of the parameter space corresponding to biologically relevant configurations. Further, we find for simulated and for gene expression data that MST-based tests discriminate well against shift and scale alternatives. As a general result, we suggest a two-step practical analysis strategy that may increase the interpretability of experimental data: first, apply the most powerful multivariate test to find the subset of pathways for which the null hypothesis is rejected and second, apply MST-based tests to these pathways to select those that support specific alternative hypotheses.
gvglazko@uams.edu or yrahmatallah@uams.edu
Supplementary data are available at Bioinformatics online.
差异表达基因集的分析已成为基因表达数据分析的常规方法。有多种测试方法可供选择,从汇总基因集基因水平统计信息的聚合测试到真正的多变量测试,都可以考虑基因间相关性。大多数方法都可以检测到复杂的偏离零假设的情况,但当零假设被拒绝时,导致拒绝的具体替代方案不容易识别。
在本文中,我们比较了基于最小生成树(MST)的非参数多变量测试与几种常用的通路分析多变量和聚合测试的功效和 I 型错误率。在我们的模拟研究中,我们证明基于 MST 的测试具有功效,对于许多情况来说,其功效与传统方法相当,但在对应于生物学相关配置的特定参数空间区域中表现优于它们。此外,我们发现对于模拟数据和基因表达数据,基于 MST 的测试可以很好地区分偏移和缩放替代方案。总的来说,我们建议采用一种两步实用分析策略,可以提高实验数据的可解释性:首先,应用最强大的多变量测试来找到拒绝零假设的通路子集,其次,将基于 MST 的测试应用于这些通路,以选择支持特定替代假设的通路。
gvglazko@uams.edu 或 yrahmatallah@uams.edu
补充数据可在 Bioinformatics 在线获取。