Debrabant Birgit
Department of Epidemiology, Biostatistics and Biodemography, University of Southern Denmark, 5000 Odense C, Denmark.
Bioinformatics. 2017 May 1;33(9):1271-1277. doi: 10.1093/bioinformatics/btw803.
Competitive gene set analysis intends to assess whether a specific set of genes is more associated with a trait than the remaining genes. However, the statistical models assumed to date to underly these methods do not enable a clear cut formulation of the competitive null hypothesis. This is a major handicap to the interpretation of results obtained from a gene set analysis.
This work presents a hierarchical statistical model based on the notion of dependence measures, which overcomes this problem. The two levels of the model naturally reflect the modular structure of many gene set analysis methods. We apply the model to show that the popular GSEA method, which recently has been claimed to test the self-contained null hypothesis, actually tests the competitive null if the weight parameter is zero. However, for this result to hold strictly, the choice of the dependence measures underlying GSEA and the estimators used for it is crucial.
Supplementary material is available at Bioinformatics online.
竞争性基因集分析旨在评估特定的一组基因是否比其余基因与某一性状更相关。然而,迄今为止用于这些方法的统计模型并不能清晰地构建竞争性零假设。这是解释从基因集分析中获得的结果的一个主要障碍。
本文提出了一种基于依赖度量概念的分层统计模型,该模型克服了这一问题。模型的两个层次自然地反映了许多基因集分析方法的模块化结构。我们应用该模型表明,最近有人声称能检验自包含零假设的流行的基因集富集分析(GSEA)方法,如果权重参数为零,实际上检验的是竞争性零假设。然而,要使这一结果严格成立,GSEA所基于的依赖度量的选择及其所用的估计量至关重要。
补充材料可在《生物信息学》在线获取。