Reeves Gabrielle A, Dallman Timothy J, Redfern Oliver C, Akpor Adrian, Orengo Christine A
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
J Mol Biol. 2006 Jul 14;360(3):725-41. doi: 10.1016/j.jmb.2006.05.035. Epub 2006 Jun 2.
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).
CATH结构域数据库已被用于探究294个结构域结构丰富的超家族中同源结构域的结构变异,每个超家族至少包含三个序列不同的亲属。我们的分析证实了一些先前检测到的将序列差异与结构变异相关联的趋势,但针对的是一个大得多的数据集,并且在一些超家族中,新数据揭示了异常的结构变异。使用一种新算法(2DSEC)来分析超家族中二级结构组成的变异性,为结构如何进化提供了新的见解。2DSEC检测到插入的二级结构,这些结构修饰了整个超家族中发现的保守二级结构的核心。分析表明,对于56%的结构域丰富的超家族(>9个序列不同的亲属),一些亲属中的二级结构数量增加了两倍或更多。在一些家族中,增加了五倍,有时会改变结构域的折叠。对48个特别可变的超家族中的二级结构插入或修饰进行人工检查发现,尽管这些插入在序列中通常是不连续的,但它们在三维空间中往往位于同一位置,从而形成一个更大的结构基序,该基序常常改变活性位点的几何形状或表面构象,促进不同的结构域伙伴关系和蛋白质相互作用。所有结构域丰富的CATH家族的自动分析支持了这些观察结果,表明小二级结构插入的积累可能为不同亲属中进化出新功能提供了一种简单机制。一些在基因组中频繁出现的分层结构域架构(例如主要为β折叠和α-β三明治结构)更频繁地利用这些类型的修饰来改变功能。在这些架构中,聚集最常发生在β折叠的边缘、顶部或底部。通过CATH同源结构词典(DHS)可以获得有关结构域超家族结构变异性的信息。