Rahmatallah Yasir, Zybailov Boris, Emmert-Streib Frank, Glazko Galina
Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
BMC Bioinformatics. 2017 Jan 24;18(1):61. doi: 10.1186/s12859-017-1482-6.
Gene set analysis (in a form of functionally related genes or pathways) has become the method of choice for analyzing omics data in general and gene expression data in particular. There are many statistical methods that either summarize gene-level statistics for a gene set or apply a multivariate statistic that accounts for intergene correlations. Most available methods detect complex departures from the null hypothesis but lack the ability to identify the specific alternative hypothesis that rejects the null.
GSAR (Gene Set Analysis in R) is an open-source R/Bioconductor software package for gene set analysis (GSA). It implements self-contained multivariate non-parametric statistical methods testing a complex null hypothesis against specific alternatives, such as differences in mean (shift), variance (scale), or net correlation structure. The package also provides a graphical visualization tool, based on the union of two minimum spanning trees, for correlation networks to examine the change in the correlation structures of a gene set between two conditions and highlight influential genes (hubs).
Package GSAR provides a set of multivariate non-parametric statistical methods that test a complex null hypothesis against specific alternatives. The methods in package GSAR are applicable to any type of omics data that can be represented in a matrix format. The package, with detailed instructions and examples, is freely available under the GPL (> = 2) license from the Bioconductor web site.
基因集分析(以功能相关基因或通路的形式)已成为分析组学数据,尤其是基因表达数据的首选方法。有许多统计方法,要么汇总基因集的基因水平统计量,要么应用考虑基因间相关性的多变量统计量。大多数现有方法能检测出与原假设的复杂偏差,但缺乏识别拒绝原假设的特定备择假设的能力。
GSAR(R语言中的基因集分析)是一个用于基因集分析(GSA)的开源R/Bioconductor软件包。它实现了自包含的多变量非参数统计方法,针对特定备择假设检验复杂的原假设,例如均值差异(偏移)、方差差异(尺度)或净相关结构差异。该软件包还基于两个最小生成树的并集提供了一个图形可视化工具,用于相关网络,以检查基因集在两种条件下的相关结构变化并突出有影响力的基因(中心节点)。
GSAR软件包提供了一组针对特定备择假设检验复杂原假设的多变量非参数统计方法。GSAR软件包中的方法适用于任何可以以矩阵格式表示的组学数据类型。该软件包带有详细说明和示例,可在GPL(>=2)许可下从Bioconductor网站免费获取。