Laboratory for Statistical Genomics and Systems Biology, Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati OH 45267-0056, USA.
BMC Bioinformatics. 2010 May 7;11:234. doi: 10.1186/1471-2105-11-234.
Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.
We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ERalpha regulatory network.
We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.
差异共表达分析是一种新兴的策略,用于描述疾病相关的基因表达调控网络失调。给定预定义的生物样本集,这种分析旨在识别在一个样本集中共表达但在另一个样本集中不共表达的基因。
我们开发了一种新的概率框架,用于联合发现具有特定共表达模式的上下文(即样本组),以及在这些上下文中具有不同共表达模式的基因组。与当前的聚类和双聚类过程不同,该模型中用于对生物样本进行分组的隐式相似性度量是基于每个样本中基因的聚类结构,而不是基于传统的基因表达水平相似性度量。在这个框架内,只要基因的共聚类结构在这些样本内是一致的,具有广泛不一致表达模式的生物样本可以被放置在同一个上下文中。据我们所知,这是迄今为止在这种一般性情况下进行无监督差异共表达分析的第一种方法。当应用于识别乳腺癌分子亚型的问题时,我们的方法在几个独立的表达数据集上识别了可重复的差异共表达模式。这些模式诱导的样本分组高度反映了疾病的结果。差异共表达基因的表达模式为 ERalpha 调控网络的复杂性质提供了新的见解。
我们证明了在对样本基因表达谱进行无监督分析时,使用共聚类结构作为相似性度量可以提供有关表达调控网络的有价值信息。