Tanay Amos, Sharan Roded, Kupiec Martin, Shamir Ron
School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2981-6. doi: 10.1073/pnas.0308661100. Epub 2004 Feb 18.
The dissection of complex biological systems is a challenging task, made difficult by the size of the underlying molecular network and the heterogeneous nature of the control mechanisms involved. Novel high-throughput techniques are generating massive data sets on various aspects of such systems. Here, we perform analysis of a highly diverse collection of genomewide data sets, including gene expression, protein interactions, growth phenotype data, and transcription factor binding, to reveal the modular organization of the yeast system. By integrating experimental data of heterogeneous sources and types, we are able to perform analysis on a much broader scope than previous studies. At the core of our methodology is the ability to identify modules, namely, groups of genes with statistically significant correlated behavior across diverse data sources. Numerous biological processes are revealed through these modules, which also obey global hierarchical organization. We use the identified modules to study the yeast transcriptional network and predict the function of >800 uncharacterized genes. Our analysis framework, SAMBA (Statistical-Algorithmic Method for Bicluster Analysis), enables the processing of current and future sources of biological information and is readily extendable to experimental techniques and higher organisms.
剖析复杂的生物系统是一项具有挑战性的任务,潜在分子网络的规模以及所涉及控制机制的异质性使其变得困难重重。新型高通量技术正在生成关于此类系统各个方面的海量数据集。在此,我们对高度多样化的全基因组数据集进行分析,这些数据集包括基因表达、蛋白质相互作用、生长表型数据以及转录因子结合数据,以揭示酵母系统的模块化组织。通过整合异质来源和类型的实验数据,我们能够在比以往研究更广泛的范围内进行分析。我们方法的核心在于识别模块的能力,即跨不同数据源具有统计学显著相关行为的基因群体。通过这些模块揭示了众多生物过程,它们也遵循全局层次组织。我们使用所识别的模块来研究酵母转录网络并预测800多个未表征基因的功能。我们的分析框架SAMBA(用于双聚类分析的统计算法方法)能够处理当前和未来的生物信息源,并且很容易扩展到实验技术和高等生物。