Vogel Christine, Berzuini Carlo, Bashton Matthew, Gough Julian, Teichmann Sarah A
MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK.
J Mol Biol. 2004 Feb 20;336(3):809-23. doi: 10.1016/j.jmb.2003.12.026.
Domains are the evolutionary units that comprise proteins, and most proteins are built from more than one domain. Domains can be shuffled by recombination to create proteins with new arrangements of domains. Using structural domain assignments, we examined the combinations of domains in the proteins of 131 completely sequenced organisms. We found two-domain and three-domain combinations that recur in different protein contexts with different partner domains. The domains within these combinations have a particular functional and spatial relationship. These units are larger than individual domains and we term them "supra-domains". Amongst the supra-domains, we identified some 1400 (1203 two-domain and 166 three-domain) combinations that are statistically significantly over-represented relative to the occurrence and versatility of the individual component domains. Over one-third of all structurally assigned multi-domain proteins contain these over-represented supra-domains. This means that investigation of the structural and functional relationships of the domains forming these popular combinations would be particularly useful for an understanding of multi-domain protein function and evolution as well as for genome annotation. These and other supra-domains were analysed for their versatility, duplication, their distribution across the three kingdoms of life and their functional classes. By examining the three-dimensional structures of several examples of supra-domains in different biological processes, we identify two basic types of spatial relationships between the component domains: the combined function of the two domains is such that either the geometry of the two domains is crucial and there is a tight constraint on the interface, or the precise orientation of the domains is less important and they are spatially separate. Frequently, the role of the supra-domain becomes clear only once the three-dimensional structure is known. Since this is the case for only a quarter of the supra-domains, we provide a list of the most important unknown supra-domains as potential targets for structural genomics projects.
结构域是构成蛋白质的进化单位,大多数蛋白质由一个以上的结构域组成。结构域可通过重组进行重排,从而产生具有新结构域排列的蛋白质。利用结构域分配,我们研究了131种完全测序生物的蛋白质中的结构域组合。我们发现了两种结构域和三种结构域的组合,它们在不同的蛋白质环境中与不同的伙伴结构域重复出现。这些组合中的结构域具有特定的功能和空间关系。这些单位比单个结构域大,我们将它们称为“超结构域”。在超结构域中,我们鉴定出约1400种(1203种两种结构域和166种三种结构域)组合,相对于单个组成结构域的出现频率和多样性,这些组合在统计学上显著过度代表。所有结构已确定的多结构域蛋白质中,超过三分之一含有这些过度代表的超结构域。这意味着,研究形成这些常见组合的结构域的结构和功能关系,对于理解多结构域蛋白质的功能和进化以及基因组注释将特别有用。我们分析了这些和其他超结构域的多样性、复制情况、它们在生命的三个界中的分布以及它们的功能类别。通过检查不同生物过程中超结构域的几个例子的三维结构,我们确定了组成结构域之间两种基本类型的空间关系:两个结构域的组合功能使得要么两个结构域的几何形状至关重要,并且对界面有严格的限制,要么结构域的精确取向不太重要,并且它们在空间上是分开的。通常,只有在三维结构已知时,超结构域的作用才会变得清晰。由于只有四分之一的超结构域是这种情况,我们提供了一份最重要的未知超结构域清单,作为结构基因组学项目的潜在目标。