Santos Suzana de Siqueira, Galatro Thais Fernanda de Almeida, Watanabe Rodrigo Akira, Oba-Shinjo Sueli Mieko, Nagahashi Marie Suely Kazue, Fujita André
Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil.
Department of Neurology, School of Medicine, University of São Paulo, São Paulo, Brazil.
PLoS One. 2015 Aug 27;10(8):e0135831. doi: 10.1371/journal.pone.0135831. eCollection 2015.
Gene set analysis aims to identify predefined sets of functionally related genes that are differentially expressed between two conditions. Although gene set analysis has been very successful, by incorporating biological knowledge about the gene sets and enhancing statistical power over gene-by-gene analyses, it does not take into account the correlation (association) structure among the genes. In this work, we present CoGA (Co-expression Graph Analyzer), an R package for the identification of groups of differentially associated genes between two phenotypes. The analysis is based on concepts of Information Theory applied to the spectral distributions of the gene co-expression graphs, such as the spectral entropy to measure the randomness of a graph structure and the Jensen-Shannon divergence to discriminate classes of graphs. The package also includes common measures to compare gene co-expression networks in terms of their structural properties, such as centrality, degree distribution, shortest path length, and clustering coefficient. Besides the structural analyses, CoGA also includes graphical interfaces for visual inspection of the networks, ranking of genes according to their "importance" in the network, and the standard differential expression analysis. We show by both simulation experiments and analyses of real data that the statistical tests performed by CoGA indeed control the rate of false positives and is able to identify differentially co-expressed genes that other methods failed.
基因集分析旨在识别在两种条件下差异表达的、功能相关的预定义基因集。尽管基因集分析非常成功,通过纳入有关基因集的生物学知识并增强相对于逐个基因分析的统计功效,但它没有考虑基因之间的相关性(关联性)结构。在这项工作中,我们展示了CoGA(共表达图分析器),这是一个用于识别两种表型之间差异关联基因组的R包。该分析基于应用于基因共表达图谱谱分布的信息论概念,例如用于测量图结构随机性的谱熵以及用于区分图类别的 Jensen-Shannon 散度。该包还包括根据结构属性(如中心性、度分布、最短路径长度和聚类系数)比较基因共表达网络的常用度量。除了结构分析之外,CoGA还包括用于网络视觉检查、根据基因在网络中的“重要性”进行排名以及标准差异表达分析的图形界面。我们通过模拟实验和实际数据分析表明,CoGA执行的统计检验确实控制了假阳性率,并且能够识别其他方法未能识别的差异共表达基因。