College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, China.
Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
Int J Mol Sci. 2024 Jun 6;25(11):6252. doi: 10.3390/ijms25116252.
Enzymes play a crucial role in various industrial production and pharmaceutical developments, serving as catalysts for numerous biochemical reactions. Determining the optimal catalytic temperature () of enzymes is crucial for optimizing reaction conditions, enhancing catalytic efficiency, and accelerating the industrial processes. However, due to the limited availability of experimentally determined data and the insufficient accuracy of existing computational methods in predicting , there is an urgent need for a computational approach to predict the values of enzymes accurately. In this study, using phosphatase (EC 3.1.3.X) as an example, we constructed a machine learning model utilizing amino acid frequency and protein molecular weight information as features and employing the K-nearest neighbors regression algorithm to predict the of enzymes. Usually, when conducting engineering for enzyme thermostability, researchers tend not to modify conserved amino acids. Therefore, we utilized this machine learning model to predict the of phosphatase sequences after removing conserved amino acids. We found that the predictive model's mean coefficient of determination (R) value increased from 0.599 to 0.755 compared to the model based on the complete sequences. Subsequently, experimental validation on 10 phosphatase enzymes with undetermined optimal catalytic temperatures shows that the predicted values of most phosphatase enzymes based on the sequence without conservative amino acids are closer to the experimental optimal catalytic temperature values. This study lays the foundation for the rapid selection of enzymes suitable for industrial conditions.
酶在各种工业生产和药物开发中起着至关重要的作用,作为许多生化反应的催化剂。确定酶的最佳催化温度()对于优化反应条件、提高催化效率和加速工业过程至关重要。然而,由于实验确定的数据有限,以及现有计算方法预测的准确性不足,因此迫切需要一种计算方法来准确预测酶的 值。在这项研究中,我们以磷酸酶(EC 3.1.3.X)为例,构建了一个机器学习模型,该模型使用氨基酸频率和蛋白质分子量信息作为特征,并采用 K-最近邻回归算法来预测酶的 值。通常,在进行酶热稳定性工程时,研究人员倾向于不修饰保守氨基酸。因此,我们利用这个机器学习模型来预测去除保守氨基酸后的磷酸酶序列的 值。我们发现,与基于完整序列的模型相比,预测模型的平均确定系数(R)值从 0.599 增加到 0.755。随后,对 10 种最优催化温度尚未确定的磷酸酶进行实验验证表明,基于无保守氨基酸序列预测的大多数磷酸酶的预测值更接近实验确定的最佳催化温度值。这项研究为快速选择适合工业条件的酶奠定了基础。