Steinhauser Dirk, Junker Björn H, Luedemann Alexander, Selbig Joachim, Kopka Joachim
Max Planck Institute of Molecular Plant Physiology, 14476 Golm, Germany.
Bioinformatics. 2004 Aug 12;20(12):1928-39. doi: 10.1093/bioinformatics/bth182. Epub 2004 Mar 25.
A major issue in computational biology is the reconstruction of functional relationships among genes, for example the definition of regulatory or biochemical pathways. One step towards this aim is the elucidation of transcriptional units, which are characterized by co-responding changes in mRNA expression levels. These units of genes will allow the generation of hypotheses about respective functional interrelationships. Thus, the focus of analysis currently moves from well-established functional assignment through comparison of protein and DNA sequences towards analysis of transcriptional co-response. Tools that allow deducing common control of gene expression have the potential to complement and extend routine BLAST comparisons, because gene function may be inferred from common transcriptional control.
We present a co-clustering strategy of genome sequence information and gene expression data, which was applied to identify transcriptional units within diverse compendia of expression profiles. The phenomenon of prokaryotic operons was selected as an ideal test case to generate well-founded hypotheses about transcriptional units. The existence of overlapping and ambiguous operon definitions allowed the investigation of constitutive and conditional expression of transcriptional units in independent gene expression experiments of Escherichia coli. Our approach allowed identification of operons with high accuracy. Furthermore, both constitutive mRNA co-response as well as conditional differences became apparent. Thus, we were able to generate insight into the possible biological relevance of gene co-response. We conclude that the suggested strategy will be amenable in general to the identification of transcriptional units beyond the chosen example of E.coli operons.
The analyses of E.coli transcript data presented here are available upon request or at http://csbdb.mpimp-golm.mpg.de/
计算生物学中的一个主要问题是重建基因之间的功能关系,例如调控或生化途径的定义。朝着这一目标迈出的一步是阐明转录单元,其特征是mRNA表达水平的共同响应变化。这些基因单元将有助于生成关于各自功能相互关系的假设。因此,目前的分析重点正从通过蛋白质和DNA序列比较进行的既定功能分配转向转录共响应分析。能够推断基因表达共同调控的工具有可能补充和扩展常规的BLAST比较,因为基因功能可以从共同的转录调控中推断出来。
我们提出了一种基因组序列信息和基因表达数据的共聚类策略,该策略用于识别不同表达谱汇编中的转录单元。原核操纵子现象被选为一个理想的测试案例,以生成关于转录单元的有充分依据的假设。重叠和模糊的操纵子定义的存在使得在大肠杆菌的独立基因表达实验中能够研究转录单元的组成型和条件型表达。我们的方法能够高精度地识别操纵子。此外,组成型mRNA共响应以及条件差异都变得明显。因此,我们能够深入了解基因共响应可能的生物学相关性。我们得出结论,所建议的策略一般适用于识别除大肠杆菌操纵子这个选定例子之外的转录单元。
此处展示的大肠杆菌转录数据的分析可应要求获取,或在http://csbdb.mpimp-golm.mpg.de/上获取。