von Mering Christian, Zdobnov Evgeny M, Tsoka Sophia, Ciccarelli Francesca D, Pereira-Leal Jose B, Ouzounis Christos A, Bork Peer
European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany.
Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15428-33. doi: 10.1073/pnas.2136809100. Epub 2003 Dec 12.
The analysis of completely sequenced genomes uncovers an astonishing variability between species in terms of gene content and order. During genome history, the genes are frequently rear-ranged, duplicated, lost, or transferred horizontally between genomes. These events appear to be stochastic, yet they are under selective constraints resulting from the functional interactions between genes. These genomic constraints form the basis for a variety of techniques that employ systematic genome comparisons to predict functional associations among genes. The most powerful techniques to date are based on conserved gene neighborhood, gene fusion events, and common phylogenetic distributions of gene families. Here we show that these techniques, if integrated quantitatively and applied to a sufficiently large number of genomes, have reached a resolution which allows the characterization of function at a higher level than that of the individual gene: global modularity becomes detectable in a functional protein network. In Escherichia coli, the predicted modules can be bench-marked by comparison to known metabolic pathways. We found as many as 74% of the known metabolic enzymes clustering together in modules, with an average pathway specificity of at least 84%. The modules extend beyond metabolism, and have led to hundreds of reliable functional predictions both at the protein and pathway level. The results indicate that modularity in protein networks is intrinsically encoded in present-day genomes.
对完全测序的基因组进行分析发现,不同物种在基因含量和顺序方面存在惊人的变异性。在基因组进化历程中,基因经常在基因组之间发生重排、复制、丢失或水平转移。这些事件看似是随机的,但它们受到基因间功能相互作用所产生的选择限制。这些基因组限制构成了多种技术的基础,这些技术利用系统的基因组比较来预测基因之间的功能关联。迄今为止最强大的技术基于保守的基因邻域、基因融合事件以及基因家族的共同系统发育分布。在这里,我们表明,如果将这些技术进行定量整合并应用于足够数量的基因组,它们已经达到了一种分辨率,能够在比单个基因更高的水平上对功能进行表征:在功能性蛋白质网络中可以检测到全局模块化。在大肠杆菌中,通过与已知代谢途径进行比较,可以对预测的模块进行基准测试。我们发现多达74%的已知代谢酶聚集在模块中,平均途径特异性至少为84%。这些模块不仅限于代谢,还在蛋白质和途径水平上带来了数百个可靠的功能预测。结果表明,蛋白质网络中的模块化在当今基因组中是内在编码的。