Cai Zhitao, Luo Fangfang, Wang Yongxian, Li Enling, Huang Yandong
College of Computer Engineering, Jimei University, Xiamen 361021, China.
ACS Omega. 2021 Dec 7;6(50):34823-34831. doi: 10.1021/acsomega.1c05440. eCollection 2021 Dec 21.
Protein p prediction is essential for the investigation of the pH-associated relationship between protein structure and function. In this work, we introduce a deep learning-based protein p predictor DeepKa, which is trained and validated with the p values derived from continuous constant-pH molecular dynamics (CpHMD) simulations of 279 soluble proteins. Here, the CpHMD implemented in the Amber molecular dynamics package has been employed (Huang Y.J. Chem. Inf. Model.2018, 58, 1372-1383). Notably, to avoid discontinuities at the boundary, grid charges are proposed to represent protein electrostatics. We show that the prediction accuracy by DeepKa is close to that by CpHMD benchmarking simulations, validating DeepKa as an efficient protein p predictor. In addition, the training and validation sets created in this study can be applied to the development of machine learning-based protein p predictors in the future. Finally, the grid charge representation is general and applicable to other topics, such as the protein-ligand binding affinity prediction.
蛋白质pKa预测对于研究蛋白质结构与功能之间的pH相关关系至关重要。在这项工作中,我们引入了一种基于深度学习的蛋白质pKa预测器DeepKa,它使用从279种可溶性蛋白质的连续恒定pH分子动力学(CpHMD)模拟中获得的pKa值进行训练和验证。在这里,我们采用了Amber分子动力学软件包中实现的CpHMD(Huang Y. J. Chem. Inf. Model. 2018, 58, 1372 - 1383)。值得注意的是,为了避免边界处的不连续性,我们提出用网格电荷来表示蛋白质静电。我们表明,DeepKa的预测准确性接近CpHMD基准模拟的准确性,这验证了DeepKa是一种有效的蛋白质pKa预测器。此外,本研究中创建的训练集和验证集可用于未来基于机器学习的蛋白质pKa预测器的开发。最后,网格电荷表示具有通用性,适用于其他主题,如蛋白质 - 配体结合亲和力预测。