Department of Pathology, Yale University School of Medicine, New Haven, CT 06511, USA, Bioengineering program, Faculty of engineering, Bar Ilan University, 5290002, Ramat Gan, Israel and Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA.
Nucleic Acids Res. 2013 Oct;41(18):e170. doi: 10.1093/nar/gkt660. Epub 2013 Aug 5.
Enrichment analysis of gene sets is a popular approach that provides a functional interpretation of genome-wide expression data. Existing tests are affected by inter-gene correlations, resulting in a high Type I error. The most widely used test, Gene Set Enrichment Analysis, relies on computationally intensive permutations of sample labels to generate a null distribution that preserves gene-gene correlations. A more recent approach, CAMERA, attempts to correct for these correlations by estimating a variance inflation factor directly from the data. Although these methods generate P-values for detecting gene set activity, they are unable to produce confidence intervals or allow for post hoc comparisons. We have developed a new computational framework for Quantitative Set Analysis of Gene Expression (QuSAGE). QuSAGE accounts for inter-gene correlations, improves the estimation of the variance inflation factor and, rather than evaluating the deviation from a null hypothesis with a P-value, it quantifies gene-set activity with a complete probability density function. From this probability density function, P-values and confidence intervals can be extracted and post hoc analysis can be carried out while maintaining statistical traceability. Compared with Gene Set Enrichment Analysis and CAMERA, QuSAGE exhibits better sensitivity and specificity on real data profiling the response to interferon therapy (in chronic Hepatitis C virus patients) and Influenza A virus infection. QuSAGE is available as an R package, which includes the core functions for the method as well as functions to plot and visualize the results.
基因集富集分析是一种流行的方法,可提供全基因组表达数据的功能解释。现有的检验方法受到基因间相关性的影响,导致高假阳性率。最广泛使用的检验方法,基因集富集分析,依赖于对样本标签进行计算密集的置换,以生成保留基因-基因相关性的零分布。最近的一种方法,CAMERA,试图通过直接从数据中估计方差膨胀因子来纠正这些相关性。尽管这些方法生成了用于检测基因集活性的 P 值,但它们无法生成置信区间或允许事后比较。我们开发了一种新的计算框架,用于基因表达的定量集分析(QuSAGE)。QuSAGE 考虑了基因间相关性,改进了方差膨胀因子的估计,并且不是使用 P 值评估对零假设的偏差,而是使用完整的概率密度函数来量化基因集的活性。从这个概率密度函数中,可以提取 P 值和置信区间,并进行事后分析,同时保持统计可追溯性。与基因集富集分析和 CAMERA 相比,QuSAGE 在分析干扰素治疗(慢性丙型肝炎病毒患者)和甲型流感病毒感染的反应的真实数据时表现出更好的灵敏度和特异性。QuSAGE 作为 R 包提供,其中包括该方法的核心功能以及用于绘图和可视化结果的功能。