Suppr超能文献

古菌、真细菌和真核生物蛋白质组中的结构域组合

Domain combinations in archaeal, eubacterial and eukaryotic proteomes.

作者信息

Apic G, Gough J, Teichmann S A

机构信息

Laboratory of Molecular Biology, MRC, Hills Road, Cambridge, CB2 2QH, UK.

出版信息

J Mol Biol. 2001 Jul 6;310(2):311-25. doi: 10.1006/jmbi.2001.4776.

Abstract

There is a limited repertoire of domain families that are duplicated and combined in different ways to form the set of proteins in a genome. Proteins are gene products, and at the level of genes, duplication, recombination, fusion and fission are the processes that produce new genes. We attempt to gain an overview of these processes by studying the evolutionary units in proteins, domains, in the protein sequences of 40 genomes. The domain and superfamily definitions in the Structural Classification of Proteins Database are used, so that we can view all pairs of adjacent domains in genome sequences in terms of their superfamily combinations. We find 783 out of the 859 superfamilies in SCOP in these genomes, and the 783 families occur in 1307 pairwise combinations. Most families are observed in combination with one or two other families, while a few families are very versatile in their combinatorial behaviour; 209 families do not make combinations with other families. This type of pattern can be described as a scale-free network. We also study the N to C-terminal orientation of domain pairs and domain repeats. The phylogenetic distribution of domain combinations is surveyed, to establish the extent of common and kingdom-specific combinations. Of the kingdom-specific combinations, significantly more combinations consist of families present in all three kingdoms than of families present in one or two kingdoms. Hence, we are led to conclude that recombination between common families, as compared to the invention of new families and recombination among these, has also been a major contribution to the evolution of kingdom-specific and species-specific functions in organisms in all three kingdoms. Finally, we compare the set of the domain combinations in the genomes to those in the RCSB Protein Data Bank, and discuss the implications for structural genomics.

摘要

在一个基因组中,存在有限的结构域家族库,这些家族以不同方式复制和组合,从而形成该基因组中的蛋白质组。蛋白质是基因产物,在基因层面,复制、重组、融合和裂变是产生新基因的过程。我们试图通过研究40个基因组蛋白质序列中的蛋白质进化单元——结构域,来全面了解这些过程。我们使用蛋白质结构分类数据库中的结构域和超家族定义,以便能够从超家族组合的角度查看基因组序列中所有相邻结构域对。在这些基因组中,我们在蛋白质结构分类数据库(SCOP)的859个超家族中发现了783个,这783个家族以1307种两两组合的形式出现。大多数家族与一两个其他家族组合出现,而少数家族在组合行为上非常多样;有209个家族不与其他家族组合。这种模式可被描述为无标度网络。我们还研究了结构域对和结构域重复的N端到C端方向。我们调查了结构域组合的系统发育分布,以确定共有组合和特定界组合的程度。在特定界组合中,由所有三个界都存在的家族组成的组合明显多于由一两个界存在的家族组成的组合。因此,我们得出结论,与新家族的产生以及这些新家族之间的重组相比,共有家族之间的重组也是所有三个界中生物体特定界和特定物种功能进化的主要贡献因素。最后,我们将基因组中的结构域组合集与RCSB蛋白质数据库中的进行比较,并讨论其对结构基因组学的意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验