ARC Centre of Excellence in Plant Energy Biology, CMS Building M310 University of Western Australia, 35 Stirling Highway, Crawley 6009, Western Australia, Australia.
Plant Methods. 2006 Apr 11;2:8. doi: 10.1186/1746-4811-2-8.
Uncovering the key sequence elements in gene promoters that regulate the expression of plant genomes is a huge task that will require a series of complementary methods for prediction, substantial innovations in experimental validation and a much greater understanding of the role of combinatorial control in the regulation of plant gene expression.
To add to this larger process and to provide alternatives to existing prediction methods, we have developed several tools in the statistical package R. ModuleFinder identifies sets of genes and treatments that we have found to form valuable sets for analysis of the mechanisms underlying gene co-expression. CoReg then links the hierarchical clustering of these co-expressed sets with frequency tables of promoter elements. These promoter elements can be drawn from known elements or all possible combinations of nucleotides in an element of various lengths. These sets of promoter elements represent putative cis-acting regulatory elements common to sets of co-expressed genes and can be prioritised for experimental testing. We have used these new tools to analyze the response of transcripts for nuclear genes encoding mitochondrial proteins in Arabidopsis to a range of chemical stresses. ModuleFinder provided a subset of co-expressed gene modules that are more logically related to biological functions than did subsets derived from traditional hierarchical clustering techniques. Importantly ModuleFinder linked responses in transcripts for electron transport chain components, carbon metabolism enzymes and solute transporter proteins. CoReg identified several promoter motifs that helped to explain the patterns of expression observed.
ModuleFinder identifies sets of genes and treatments that form useful sets for analysis of the mechanisms behind co-expression. CoReg links the clustering tree of expression-based relationships in these sets with frequency tables of promoter elements. These sets of promoter elements represent putative cis-acting regulatory elements for sets of genes, and can then be tested experimentally. We consider these tools, both built on an open source software product to provide valuable, alternative tools for the prioritisation of promoter elements for experimental analysis.
揭示调控植物基因组表达的基因启动子中的关键序列元件是一项艰巨的任务,需要一系列互补的预测方法、在实验验证方面的实质性创新,以及对组合控制在调控植物基因表达中的作用有更深入的理解。
为了促进这一更大的进程,并为现有的预测方法提供替代方法,我们在统计软件包 R 中开发了几种工具。ModuleFinder 确定了一组基因和处理方法,我们发现这些基因和处理方法对于分析基因共表达的机制非常有价值。然后,CoReg 将这些共表达集的层次聚类与启动子元件的频率表联系起来。这些启动子元件可以从已知元件或各种长度的元件中的所有可能核苷酸组合中提取。这些启动子元件代表了共表达基因集中常见的假定顺式作用调节元件,并可以优先进行实验测试。我们使用这些新工具分析了拟南芥核基因编码线粒体蛋白的转录物对一系列化学胁迫的反应。ModuleFinder 提供了一组与生物学功能更相关的共表达基因模块,而不是传统的层次聚类技术得出的子集。重要的是,ModuleFinder 将电子传递链组件、碳代谢酶和溶质转运蛋白的转录物的反应联系起来。CoReg 确定了几个启动子基序,有助于解释观察到的表达模式。
ModuleFinder 确定了一组基因和处理方法,这些方法形成了分析共表达机制的有用集。CoReg 将这些集的基于表达的关系聚类树与启动子元件的频率表联系起来。这些启动子元件代表了一组基因的假定顺式作用调节元件,然后可以进行实验测试。我们认为这些工具都建立在开源软件产品之上,可以为实验分析启动子元件的优先级提供有价值的替代工具。