Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China.
Zelixir Biotech Company Ltd, Shanghai, China.
Protein Sci. 2024 Sep;33(9):e5097. doi: 10.1002/pro.5097.
Disulfide bonds, covalently formed by sulfur atoms in cysteine residues, play a crucial role in protein folding and structure stability. Considering their significance, artificial disulfide bonds are often introduced to enhance protein thermostability. Although an increasing number of tools can assist with this task, significant amounts of time and resources are often wasted owing to inadequate consideration. To enhance the accuracy and efficiency of designing disulfide bonds for protein thermostability improvement, we initially collected disulfide bond and protein thermostability data from extensive literature sources. Thereafter, we extracted various sequence- and structure-based features and constructed machine-learning models to predict whether disulfide bonds can improve protein thermostability. Among all models, the neighborhood context model based on the Adaboost-DT algorithm performed the best, yielding "area under the receiver operating characteristic curve" and accuracy scores of 0.773 and 0.714, respectively. Furthermore, we also found AlphaFold2 to exhibit high superiority in predicting disulfide bonds, and to some extent, the coevolutionary relationship between residue pairs potentially guided artificial disulfide bond design. Moreover, several mutants of imine reductase 89 (IR89) with artificially designed thermostable disulfide bonds were experimentally proven to be considerably efficient for substrate catalysis. The SS-bond data have been integrated into an online server, namely, ThermoLink, available at guolab.mpu.edu.mo/thermoLink.
二硫键由半胱氨酸残基中的硫原子通过共价键形成,在蛋白质折叠和结构稳定性中起着至关重要的作用。考虑到它们的重要性,通常会引入人工二硫键来增强蛋白质的热稳定性。尽管越来越多的工具可以辅助完成这项任务,但由于考虑不充分,往往会浪费大量的时间和资源。为了提高设计用于提高蛋白质热稳定性的二硫键的准确性和效率,我们最初从广泛的文献来源中收集了二硫键和蛋白质热稳定性数据。此后,我们提取了各种基于序列和结构的特征,并构建了机器学习模型来预测二硫键是否可以提高蛋白质的热稳定性。在所有模型中,基于 Adaboost-DT 算法的邻域上下文模型表现最好,分别产生了“接收器操作特征曲线下的面积”和准确率为 0.773 和 0.714。此外,我们还发现 AlphaFold2 在预测二硫键方面表现出很高的优越性,并且残基对之间的共进化关系在某种程度上可能指导人工二硫键设计。此外,实验证明,具有人工设计的热稳定二硫键的亚胺还原酶 89(IR89)的几个突变体在底物催化方面非常有效。二硫键数据已整合到一个名为 ThermoLink 的在线服务器中,该服务器可在 guolab.mpu.edu.mo/thermoLink 上访问。