Division of Glycoscience, School of Biotechnology, KTH - Royal Institute of Technology, AlbaNova University Center, Stockholm SE-106 91, Sweden.
BMC Evol Biol. 2012 Sep 20;12:186. doi: 10.1186/1471-2148-12-186.
The large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on β-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5.
About 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions.
Overall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at http://www.cazy.org/GH5.html.
糖苷水解酶家族 5(GH5)包含了一大类作用于β-连接的寡糖和多糖以及来自各种生物的糖缀合物的酶。由于该酶家族的长期和复杂进化及其广泛的序列多样性,限制了其功能预测。为了在基于知识的背景下提高酶特异性的区分能力,并获得新的进化见解,我们在此提出了一种新的、稳健的 GH5 亚家族分类方法。
在对所有公开可用的 GH5 序列和相关生化数据进行的全局分析中,约 80%的现有序列被分配到 51 个亚家族中。对具有催化活性成员的亚家族进行检查发现,三分之一是单特异性的(包含单一酶活性),尽管未来通过生化特征分析可能会发现新的功能。此外,二十个亚家族目前尚未得到任何表征,许多其他亚家族只有有限的结构和生化数据。将功能知识映射到 GH5 系统发育树上表明,这个具有历史和工业重要性的家族的序列空间远未得到很好的分散,突出了需要进一步研究的目标。该分析还揭示了一些失去催化机制的 GH5 蛋白,表明它们正在向新的功能进化。
总体而言,GH5 的亚家族划分提供了糖原组学大规模蛋白质序列注释的主动管理资源;亚家族分配可通过碳水化合物活性酶数据库(http://www.cazy.org/GH5.html)公开获取。