CSIRO Agriculture & Food, St Lucia, QLD 4067, Australia.
School of Agriculture and Food Sciences, The University of Queensland, Brisbane, QLD 4072, Australia.
Genes (Basel). 2020 Oct 20;11(10):1231. doi: 10.3390/genes11101231.
Genome-wide gene expression analysis are routinely used to gain a systems-level understanding of complex processes, including network connectivity. Network connectivity tends to be built on a small subset of extremely high co-expression signals that are deemed significant, but this overlooks the vast majority of pairwise signals. Here, we developed a computational pipeline to assign to every gene its pair-wise genome-wide co-expression distribution to one of 8 template distributions shapes varying between unimodal, bimodal, skewed, or symmetrical, representing different proportions of positive and negative correlations. We then used a hypergeometric test to determine if specific genes (regulators versus non-regulators) and properties (differentially expressed or not) are associated with a particular distribution shape. We applied our methodology to five publicly available RNA sequencing (RNA-seq) datasets from four organisms in different physiological conditions and tissues. Our results suggest that genes can be assigned consistently to pre-defined distribution shapes, regarding the enrichment of differential expression and regulatory genes, in situations involving contrasting phenotypes, time-series, or physiological baseline data. There is indeed a striking additional biological signal present in the genome-wide distribution of co-expression values which would be overlooked by currently adopted approaches. Our method can be applied to extract further information from transcriptomic data and help uncover the molecular mechanisms involved in the regulation of complex biological process and phenotypes.
全基因组基因表达分析通常用于从系统水平上理解复杂的过程,包括网络连通性。网络连通性往往建立在一小部分被认为是显著的极高共表达信号的基础上,但这忽略了绝大多数的成对信号。在这里,我们开发了一种计算管道,将每个基因的全基因组成对共表达分布分配到 8 种模板分布形状之一,这些形状在单峰、双峰、偏态或对称之间变化,代表正相关和负相关的不同比例。然后,我们使用超几何检验来确定特定基因(调节剂与非调节剂)和特性(是否差异表达)是否与特定的分布形状相关。我们将我们的方法应用于来自四个生物体在不同生理条件和组织中的五个公开可用的 RNA 测序 (RNA-seq) 数据集。我们的结果表明,在涉及对比表型、时间序列或生理基线数据的情况下,基因可以根据差异表达和调节基因的富集情况,一致地分配到预定义的分布形状中。在全基因组共表达值分布中确实存在着一个引人注目的额外生物学信号,这是目前采用的方法所忽略的。我们的方法可以用于从转录组数据中提取更多信息,并帮助揭示参与复杂生物过程和表型调节的分子机制。