Department of Biological Sciences, California State University Long Beach, Long Beach, California, USA.
Department of Chemistry and Biochemistry, California State University Long Beach, Long Beach, California, USA.
Sci Rep. 2019 Jul 12;9(1):10137. doi: 10.1038/s41598-019-46290-w.
The identification of glycoside hydrolases (GHs) for efficient polysaccharide deconstruction is essential for the development of biofuels. Here, we investigate the potential of sequential HMM-profile identification for the rapid and precise identification of the multi-domain architecture of GHs from various datasets. First, as a validation, we successfully reannotated >98% of the biochemically characterized enzymes listed on the CAZy database. Next, we analyzed the 43 million non-redundant sequences from the M5nr data and identified 322,068 unique GHs. Finally, we searched 129 assembled metagenomes retrieved from MG-RAST for environmental GHs and identified 160,790 additional enzymes. Although most identified sequences corresponded to single domain enzymes, many contained several domains, including known accessory domains and some domains never identified in association with GH. Several sequences displayed multiple catalytic domains and few of these potential multi-activity proteins combined potentially synergistic domains. Finally, we produced and confirmed the biochemical activities of a GH5-GH10 cellulase-xylanase and a GH11-CE4 xylanase-esterase. Globally, this "gene to enzyme pipeline" provides a rationale for mining large datasets in order to identify new catalysts combining unique properties for the efficient deconstruction of polysaccharides.
糖苷水解酶(GHs)的鉴定对于高效多糖解构生物燃料的发展至关重要。在这里,我们研究了顺序 HMM 谱图识别在快速、准确鉴定来自不同数据集的 GHs 多结构域结构方面的潜力。首先,作为验证,我们成功地重新注释了 CAZy 数据库中列出的 >98%具有生物化学特征的酶。接下来,我们分析了 M5nr 数据中 4300 万个非冗余序列,鉴定出 322068 个独特的 GHs。最后,我们在 MG-RAST 中搜索了 129 个组装的宏基因组,以寻找环境 GHs,并鉴定出 160790 个额外的酶。尽管大多数鉴定的序列对应于单结构域酶,但许多序列包含多个结构域,包括已知的辅助结构域和一些从未与 GH 相关联的结构域。一些序列显示出多个催化结构域,其中少数这些潜在的多活性蛋白组合了协同作用的结构域。最后,我们生产并证实了 GH5-GH10 纤维素酶-木聚糖酶和 GH11-CE4 木聚糖酶-酯酶的生化活性。总体而言,这种“从基因到酶的流水线”为挖掘大型数据集以鉴定具有独特特性的新催化剂提供了依据,这些新催化剂可用于高效解构多糖。