Clements Maarten, van Someren Eugene P, Knijnenburg Theo A, Reinders Marcel J T
Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2600 GA Delft, the Netherlands.
Genomics Proteomics Bioinformatics. 2007 May;5(2):86-101. doi: 10.1016/S1672-0229(07)60019-9.
The common approach to find co-regulated genes is to cluster genes based on gene expression. However, due to the limited information present in any dataset, genes in the same cluster might be co-expressed but not necessarily co-regulated. In this paper, we propose to integrate known transcription factor binding site information and gene expression data into a single clustering scheme. This scheme will find clusters of co-regulated genes that are not only expressed similarly under the measured conditions, but also share a regulatory structure that may explain their common regulation. We demonstrate the utility of this approach on a microarray dataset of yeast grown under different nutrient and oxygen limitations. Our integrated clustering method not only unravels many regulatory modules that are consistent with current biological knowledge, but also provides a more profound understanding of the underlying process. The added value of our approach, compared with the clustering solely based on gene expression, is its ability to uncover clusters of genes that are involved in more specific biological processes and are evidently regulated by a set of transcription factors.
寻找共同调控基因的常用方法是基于基因表达对基因进行聚类。然而,由于任何数据集中存在的信息有限,同一聚类中的基因可能是共表达的,但不一定是共同调控的。在本文中,我们建议将已知的转录因子结合位点信息和基因表达数据整合到一个单一的聚类方案中。该方案将找到共同调控基因的聚类,这些基因不仅在测量条件下表达相似,而且共享一个可能解释其共同调控的调控结构。我们在酵母在不同营养和氧气限制条件下生长的微阵列数据集上证明了这种方法的实用性。我们的综合聚类方法不仅揭示了许多与当前生物学知识一致的调控模块,还提供了对潜在过程更深刻的理解。与仅基于基因表达的聚类相比,我们方法的附加值在于它能够发现参与更特定生物学过程且明显受一组转录因子调控的基因聚类。