Huang Rong, Zhang Hejian, Wu Min, Men Zhiyue, Chu Huanyu, Bai Jie, Chang Hong, Cheng Jian, Liao Xiaoping, Liu Yuwan, Song Yajian, Jiang Huifeng
School of Biological Engineering, Tianjin University of Science & Technology, Tianjin 300457, China.
Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
Sheng Wu Gong Cheng Xue Bao. 2024 Dec 25;40(12):4670-4681. doi: 10.13345/j.cjb.240255.
The structures and activities of enzymes are influenced by pH of the environment. Understanding and distinguishing the adaptation mechanisms of enzymes to extreme pH values is of great significance for elucidating the molecular mechanisms and promoting the industrial applications of enzymes. In this study, the ESM-2 protein language model was used to encode the secreted microbial proteins with the optimal performance above pH 9 and below pH 5, which yielded 47 725 high-pH protein sequences and 66 079 low-pH protein sequences, respectively. A deep learning model was constructed to identify protein acid-base tolerance based on amino acid sequences. The model showcased significantly higher accuracy than other methods, with the overall accuracy of 94.8%, precision of 91.8%, and a recall rate of 93.4% on the test set. Furthermore, we built a website (https://enzymepred.biodesign.ac.cn), which enabled users to predict the acid-base tolerance by submitting the protein sequences of enzymes. This study has accelerated the application of enzymes in various fields, including biotechnology, pharmaceuticals, and chemicals. It provides a powerful tool for the rapid screening and optimization of industrial enzymes.
酶的结构和活性受环境pH值的影响。了解和区分酶对极端pH值的适应机制对于阐明分子机制和促进酶的工业应用具有重要意义。在本研究中,使用ESM-2蛋白质语言模型对在pH 9以上和pH 5以下具有最佳性能的分泌型微生物蛋白质进行编码,分别产生了47725个高pH蛋白质序列和66079个低pH蛋白质序列。构建了一个基于氨基酸序列识别蛋白质酸碱耐受性的深度学习模型。该模型的准确率显著高于其他方法,在测试集上的总体准确率为94.8%,精确率为91.8%,召回率为93.4%。此外,我们建立了一个网站(https://enzymepred.biodesign.ac.cn),用户可以通过提交酶的蛋白质序列来预测酸碱耐受性。本研究加速了酶在生物技术、制药和化工等各个领域的应用。它为工业酶的快速筛选和优化提供了一个强大的工具。