Ye Yuzhen, Godzik Adam
Program in Bioinformatics and Systems Biology, The Burnham Institute, La Jolla, California 92037, USA.
Genome Res. 2004 Mar;14(3):343-53. doi: 10.1101/gr.1610504.
We have developed a set of graph theory-based tools, which we call Comparative Analysis of Protein Domain Organization (CADO), to survey and compare protein domain organizations of different organisms. In the language of CADO, the organization of protein domains in a given organism is shown as a domain graph in which protein domains are represented as vertices, and domain combinations, defined as instances of two domains found in one protein, are represented as edges. CADO provides a new way to analyze and compare whole proteomes, including identifying the consensus and difference of domain organization between organisms. CADO was used to analyze and compare >50 bacterial, archaeal, and eukaryotic genomes. Examples and overviews presented here include the analysis of the modularity of domain graphs and the functional study of domains based on the graph topology. We also report on the results of comparing domain graphs of two organisms, Pyrococcus horikoshii (an extremophile) and Haemophilus influenzae (a parasite with reduced genome) with other organisms. Our comparison provides new insights into the genome organization of these organisms. Finally, we report on the specific domain combinations characterizing the three kingdoms of life, and the kingdom "signature" domain organizations derived from those specific domain combinations.
我们开发了一套基于图论的工具,称为蛋白质结构域组织比较分析(CADO),用于调查和比较不同生物体的蛋白质结构域组织。用CADO的语言来说,给定生物体中蛋白质结构域的组织表现为一个结构域图,其中蛋白质结构域表示为顶点,而结构域组合(定义为在一种蛋白质中发现的两个结构域的实例)表示为边。CADO提供了一种分析和比较整个蛋白质组的新方法,包括识别生物体之间结构域组织的共识和差异。CADO被用于分析和比较50多个细菌、古菌和真核生物基因组。这里给出的例子和概述包括对结构域图模块性的分析以及基于图拓扑结构对结构域的功能研究。我们还报告了将两种生物体——嗜热栖热菌(一种嗜极生物)和流感嗜血杆菌(一种基因组简化的寄生菌)的结构域图与其他生物体进行比较的结果。我们的比较为这些生物体的基因组组织提供了新的见解。最后,我们报告了表征生命三界的特定结构域组合,以及从这些特定结构域组合衍生出的界“特征”结构域组织。