Yuan Shinsheng, Li Ker-Chau
Institute of Statistical Science, Acadmia Sinica, 128, Section 2, Academia Road, Nankang, Taipei 115, Taiwan, ROC.
Bioinformatics. 2007 Nov 15;23(22):3039-47. doi: 10.1093/bioinformatics/btm457. Epub 2007 Sep 10.
High-throughput expression profiling allows researchers to study gene activities globally. Genes with similar expression profiles are likely to encode proteins that may participate in a common structural complex, metabolic pathway or biological process. Many clustering, classification and dimension reduction approaches, powerful in elucidating the expression data, are based on this rationale. However, the converse of this common perception can be misleading. In fact, many biologically related genes turn out uncorrelated in expression.
In this article, we present a novel method for investigating gene co-expression patterns. We assume the correlation between functionally related genes can be strengthened or weakened according to changes in some relevant, yet unknown, cellular states. We develop a context-dependent clustering (CDC) method to model the cellular state variable. We apply it to the transcription regulatory study for Saccharomyces cerevisiae, using the Stanford cell-cycle gene expression data. We investigate the co-expression patterns between transcription factors (TFs) and their target genes (TGs) predicted by the genome-wide location analysis of Harbison et al. Since TF regulates the expression of its TGs, correlation between TFs and TGs expression profiles can be expected. But as many authors have observed, the expression of transcription factors do not correlate well with the expression of their target genes. Instead of attributing the main reason to the lack of correlation between the transcript abundance and TF activity, we search for cellular conditions that would facilitate the TF-TG correlation. The results for sulfur amino acid pathway regulation by MET4, respiratory genes regulation by HAP4, and mitotic cell cycle regulation by ACE2/SWI5 are discussed in detail. Our method suggests a new way to understand the complex biological system from microarray data.
高通量表达谱分析使研究人员能够全面研究基因活性。具有相似表达谱的基因可能编码参与共同结构复合体、代谢途径或生物过程的蛋白质。许多在阐明表达数据方面很强大的聚类、分类和降维方法都是基于这一原理。然而,这种普遍观念的反面可能会产生误导。事实上,许多生物学相关的基因在表达上却不相关。
在本文中,我们提出了一种研究基因共表达模式的新方法。我们假设功能相关基因之间的相关性可以根据一些相关但未知的细胞状态变化而增强或减弱。我们开发了一种上下文相关聚类(CDC)方法来对细胞状态变量进行建模。我们将其应用于酿酒酵母的转录调控研究,使用斯坦福细胞周期基因表达数据。我们研究了转录因子(TFs)与其通过哈比森等人的全基因组定位分析预测的靶基因(TGs)之间的共表达模式。由于TF调节其TGs的表达,可以预期TFs和TGs表达谱之间存在相关性。但正如许多作者所观察到的,转录因子的表达与其靶基因的表达相关性不佳。我们没有将主要原因归因于转录本丰度与TF活性之间缺乏相关性,而是寻找能够促进TF-TG相关性的细胞条件。详细讨论了MET4对硫氨基酸途径的调控、HAP4对呼吸基因的调控以及ACE2/SWI5对有丝分裂细胞周期的调控结果。我们的方法为从微阵列数据理解复杂生物系统提供了一种新途径。