College of Mathematics and Computer Science, Dali University, Dali 671000, China.
College of Agriculture and Biological Science, Dali University, Dali 671000, China.
Molecules. 2024 Jul 31;29(15):3614. doi: 10.3390/molecules29153614.
Identifying the catalytic regioselectivity of enzymes remains a challenge. Compared to experimental trial-and-error approaches, computational methods like molecular dynamics simulations provide valuable insights into enzyme characteristics. However, the massive data generated by these simulations hinder the extraction of knowledge about enzyme catalytic mechanisms without adequate modeling techniques. Here, we propose a computational framework utilizing graph-based active learning from molecular dynamics to identify the regioselectivity of ginsenoside hydrolases (GHs), which selectively catalyze C6 or C20 positions to obtain rare deglycosylated bioactive compounds from plants. Experimental results reveal that the dynamic-aware graph model can excellently distinguish GH regioselectivity with accuracy as high as 96-98% even when different enzyme-substrate systems exhibit similar dynamic behaviors. The active learning strategy equips our model to work robustly while reducing the reliance on dynamic data, indicating its capacity to mine sufficient knowledge from short multi-replica simulations. Moreover, the model's interpretability identified crucial residues and features associated with regioselectivity. Our findings contribute to the understanding of GH catalytic mechanisms and provide direct assistance for rational design to improve regioselectivity. We presented a general computational framework for modeling enzyme catalytic specificity from simulation data, paving the way for further integration of experimental and computational approaches in enzyme optimization and design.
确定酶的催化区域选择性仍然是一个挑战。与实验试错方法相比,分子动力学模拟等计算方法为研究酶的特性提供了有价值的见解。然而,这些模拟产生的大量数据,如果没有适当的建模技术,就难以从中提取有关酶催化机制的知识。在这里,我们提出了一种利用基于图的分子动力学主动学习来识别人参皂苷水解酶 (GHs) 区域选择性的计算框架,该酶选择性地催化 C6 或 C20 位置,从植物中获得罕见的去糖基化生物活性化合物。实验结果表明,即使不同的酶-底物系统表现出相似的动态行为,动态感知图模型也可以出色地区分 GH 的区域选择性,准确率高达 96-98%。主动学习策略使我们的模型能够稳健地工作,同时减少对动态数据的依赖,表明它有能力从短的多副本模拟中挖掘出足够的知识。此外,该模型的可解释性确定了与区域选择性相关的关键残基和特征。我们的研究结果有助于理解 GH 的催化机制,并为合理设计以提高区域选择性提供直接帮助。我们提出了一种从模拟数据中建模酶催化特异性的通用计算框架,为进一步整合实验和计算方法在酶优化和设计中的应用铺平了道路。