Glazko Galina, Rahmatallah Yasir, Zybailov Boris, Emmert-Streib Frank
Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
Methods Mol Biol. 2017;1613:125-159. doi: 10.1007/978-1-4939-7027-8_7.
The analysis of gene sets (in a form of functionally related genes or pathways) has become the method of choice for extracting the strongest signals from omics data. The motivation behind using gene sets instead of individual genes is two-fold. First, this approach incorporates pre-existing biological knowledge into the analysis and facilitates the interpretation of experimental results. Second, it employs a statistical hypotheses testing framework. Here, we briefly review main Gene Set Analysis (GSA) approaches for testing differential expression of gene sets and several GSA approaches for testing statistical hypotheses beyond differential expression that allow extracting additional biological information from the data. We distinguish three major types of GSA approaches testing: (1) differential expression (DE), (2) differential variability (DV), and (3) differential co-expression (DC) of gene sets between two phenotypes. We also present comparative power analysis and Type I error rates for different approaches in each major type of GSA on simulated data. Our evaluation presents a concise guideline for selecting GSA approaches best performing under particular experimental settings. The value of the three major types of GSA approaches is illustrated with real data example. While being applied to the same data set, major types of GSA approaches result in complementary biological information.
基因集(以功能相关基因或通路的形式)分析已成为从组学数据中提取最强信号的首选方法。使用基因集而非单个基因背后的动机有两方面。首先,这种方法将预先存在的生物学知识纳入分析,便于解释实验结果。其次,它采用了统计假设检验框架。在此,我们简要回顾用于测试基因集差异表达的主要基因集分析(GSA)方法,以及用于测试除差异表达之外的统计假设的几种GSA方法,这些方法能够从数据中提取额外的生物学信息。我们区分了三种主要的GSA测试方法类型:(1)基因集的差异表达(DE)、(2)差异变异性(DV)和(3)差异共表达(DC)。我们还给出了在模拟数据上每种主要类型GSA中不同方法的比较功效分析和I型错误率。我们的评估为在特定实验设置下选择性能最佳的GSA方法提供了简洁的指导方针。通过实际数据示例说明了三种主要类型GSA方法的价值。当应用于同一数据集时,主要类型的GSA方法会产生互补的生物学信息。