Simillion Cedric, Liechti Robin, Lischer Heidi E L, Ioannidis Vassilios, Bruggmann Rémy
Interfaculty Bioinformatics Unit and SIB Swiss Institute of Bioinformatics, University of Bern, Baltzerstrasse 6, 3012, Berne, Switzerland.
Department of Clinical Research, University of Bern, Murtenstrasse 35, 3008, Berne, Switzerland.
BMC Bioinformatics. 2017 Mar 4;18(1):151. doi: 10.1186/s12859-017-1571-6.
The purpose of gene set enrichment analysis (GSEA) is to find general trends in the huge lists of genes or proteins generated by many functional genomics techniques and bioinformatics analyses.
Here we present SetRank, an advanced GSEA algorithm which is able to eliminate many false positive hits. The key principle of the algorithm is that it discards gene sets that have initially been flagged as significant, if their significance is only due to the overlap with another gene set. The algorithm is explained in detail and its performance is compared to that of other methods using objective benchmarking criteria. Furthermore, we explore how sample source bias can affect the results of a GSEA analysis.
The benchmarking results show that SetRank is a highly specific tool for GSEA. Furthermore, we show that the reliability of results can be improved by taking sample source bias into account. SetRank is available as an R package and through an online web interface.
基因集富集分析(GSEA)的目的是在众多功能基因组学技术和生物信息学分析所生成的大量基因或蛋白质列表中找到总体趋势。
在此我们介绍SetRank,一种先进的GSEA算法,它能够消除许多假阳性结果。该算法的关键原理是,如果基因集的显著性仅源于与另一个基因集的重叠,那么就丢弃那些最初被标记为显著的基因集。详细解释了该算法,并使用客观的基准标准将其性能与其他方法进行了比较。此外,我们探讨了样本来源偏差如何影响GSEA分析的结果。
基准测试结果表明SetRank是一种用于GSEA的高度特异性工具。此外,我们表明通过考虑样本来源偏差可以提高结果的可靠性。SetRank可作为R包通过在线网络界面获取。