结构域重复与重组之间的关系。
The relationship between domain duplication and recombination.
作者信息
Vogel Christine, Teichmann Sarah A, Pereira-Leal Jose
机构信息
MRC Laboratory of Molecular Biology, Hills Rd, Cambridge CB2 2QH, UK.
出版信息
J Mol Biol. 2005 Feb 11;346(1):355-65. doi: 10.1016/j.jmb.2004.11.050. Epub 2004 Dec 23.
Protein domains represent the basic evolutionary units that form proteins. Domain duplication and shuffling by recombination are probably the most important forces driving protein evolution and hence the complexity of the proteome. While the duplication of whole genes as well as domain-encoding exons increases the abundance of domains in the proteome, domain shuffling increases versatility, i.e. the number of distinct contexts in which a domain can occur. Here, we describe a comprehensive, genome-wide analysis of the relationship between these two processes. We observe a strong and robust correlation between domain versatility and abundance: domains that occur more often also have many different combination partners. This supports the view that domain recombination occurs in a random way. However, we do not observe all the different combinations that are expected from a simple random recombination scenario, and this is due to frequent duplication of specific domain combinations. When we simulate the evolution of the protein repertoire considering stochastic recombination of domains followed by extensive duplication of the combinations, we approximate the observed data well. Our analyses are consistent with a stochastic process that governs domain recombination and thus protein divergence with respect to domains within a polypeptide chain. At the same time, they support a scenario in which domain combinations are formed only once during the evolution of the protein repertoire, and are then duplicated to various extents. The extent of duplication of different combinations varies widely and, in nature, will depend on selection for the domain combination based on its function. Some of the pair-wise domain combinations that are highly duplicated also recur frequently with other partner domains, and thus represent evolutionary units larger than single protein domains, which we term "supra-domains".
蛋白质结构域是构成蛋白质的基本进化单位。通过重组实现的结构域复制和重排可能是推动蛋白质进化以及蛋白质组复杂性的最重要力量。虽然全基因复制以及编码结构域的外显子复制会增加蛋白质组中结构域的丰度,但结构域重排会增加多功能性,即一个结构域能够出现的不同上下文的数量。在此,我们描述了对这两个过程之间关系的全面、全基因组分析。我们观察到结构域多功能性与丰度之间存在强烈且稳健的相关性:出现频率更高的结构域也有许多不同的组合伙伴。这支持了结构域重组以随机方式发生的观点。然而,我们并未观察到简单随机重组情况下预期的所有不同组合,这是由于特定结构域组合的频繁复制所致。当我们考虑结构域的随机重组随后对组合进行广泛复制来模拟蛋白质库的进化时,我们能很好地拟合观察到的数据。我们的分析与一个控制结构域重组从而控制多肽链内结构域蛋白质差异的随机过程相一致。同时,它们支持这样一种情形,即结构域组合在蛋白质库进化过程中仅形成一次,然后在不同程度上进行复制。不同组合的复制程度差异很大,在自然界中,这将取决于基于其功能对结构域组合的选择。一些高度复制的成对结构域组合也经常与其他伙伴结构域重复出现,因此代表了比单个蛋白质结构域更大的进化单位,我们将其称为“超结构域”。