Kundu Siddhartha, Sharma Rita
Department of Biochemistry, Dr. Baba Saheb Ambedkar Medical College & HospitalNew Delhi, India; Mathematical and Computational Biology, Information Technology Research Academy, Media Lab AsiaNew Delhi, India; School of Computational and Integrative Sciences, Jawaharlal Nehru UniversityNew Delhi, India.
School of Computational and Integrative Sciences, Jawaharlal Nehru University New Delhi, India.
Front Plant Sci. 2016 Aug 12;7:1185. doi: 10.3389/fpls.2016.01185. eCollection 2016.
The glycoside hydrolase 9 superfamily, mainly comprising the endoglucanases, is represented in all three domains of life. The current division of GH9 enzymes, into three subclasses, namely A, B, and C, is centered on parameters derived from sequence information alone. However, this classification is ambiguous, and is limited by the paralogous ancestry of classes B and C endoglucanases, and paucity of biochemical and structural data. Here, we extend this classification schema to putative GH9 endoglucanases present in green plants, with an emphasis on identifying novel members of the class C subset. These enzymes cleave the β(1 → 4) linkage between non-terminal adjacent D-glucopyranose residues, in both, amorphous and crystalline regions of cellulose. We utilized non redundant plant GH9 enzymes with characterized molecular data, as the training set to construct Hidden Markov Models (HMMs) and train an Artificial Neural Network (ANN). The parameters that were used for predicting dominant enzyme function, were derived from this training set, and subsequently refined on 147 sequences with available expression data. Our knowledge-based approach, can ascribe differential endoglucanase activity (A, B, or C) to a query sequence with high confidence, and was used to construct a local repository of class C GH9 endoglucanases (GH9C = 241) from 32 sequenced green plants.
糖苷水解酶9超家族主要由内切葡聚糖酶组成,在生命的三个域中均有代表。目前将GH9酶分为三个亚类,即A、B和C,仅以序列信息得出的参数为核心。然而,这种分类并不明确,并且受到B类和C类内切葡聚糖酶的旁系同源谱系以及生化和结构数据匮乏的限制。在这里,我们将这种分类模式扩展到绿色植物中存在的假定GH9内切葡聚糖酶,重点是识别C类子集的新成员。这些酶在纤维素的无定形和结晶区域中切割非末端相邻D-吡喃葡萄糖残基之间的β(1→4)键。我们利用具有特征分子数据的非冗余植物GH9酶作为训练集来构建隐马尔可夫模型(HMM)并训练人工神经网络(ANN)。用于预测主要酶功能的参数来自该训练集,随后在147个具有可用表达数据的序列上进行了优化。我们基于知识的方法能够以高置信度将不同的内切葡聚糖酶活性(A、B或C)归因于查询序列,并用于构建来自32种已测序绿色植物的C类GH9内切葡聚糖酶的本地存储库(GH9C = 241)。