Gokcan Hatice, Isayev Olexandr
Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA USA
Chem Sci. 2022 Feb 1;13(8):2462-2474. doi: 10.1039/d1sc05610g. eCollection 2022 Feb 23.
The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of p are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein p prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the p for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.
蛋白质的行为与残基的质子化状态密切相关。因此,预测和测量pKa对于理解蛋白质的基本功能至关重要。在这项工作中,我们基于深度表示学习开发了一种新的蛋白质pKa预测经验方案。它将机器学习与原子环境向量(AEV)以及从ANI-2x神经网络势中学习到的量子力学表示相结合(《化学理论与计算杂志》,2020年,第16卷,第4192页)。该方案仅需蛋白质的坐标信息作为输入,并分别估计所有五种可滴定氨基酸类型的pKa。通过交叉验证和外部蛋白质测试集分析了该方法的准确性。将所得结果与广泛使用的经验方法PROPKA进行了比较。新的经验模型对所有氨基酸类型的平均绝对误差(MAE)均低于0.5,具有较高的准确性。它超过了PROPKA的准确性,并且比空模型表现明显更好。我们的模型对局部构象变化和分子相互作用也很敏感。