基于机器学习的蛋白质 p 值准确预测。

Basis for Accurate Protein p Prediction with Machine Learning.

机构信息

College of Computer Engineering, Jimei University, Xiamen 361021, China.

出版信息

J Chem Inf Model. 2023 May 22;63(10):2936-2947. doi: 10.1021/acs.jcim.3c00254. Epub 2023 May 5.

DOI:10.1021/acs.jcim.3c00254

Abstract

pH regulates protein structures and the associated functions in many biological processes via protonation and deprotonation of ionizable side chains where the titration equilibria are determined by p's. To accelerate pH-dependent molecular mechanism research in the life sciences or industrial protein and drug designs, fast and accurate p prediction is crucial. Here we present a theoretical p data set PHMD549, which was successfully applied to four distinct machine learning methods, including DeepKa, which was proposed in our previous work. To reach a valid comparison, EXP67S was selected as the test set. Encouragingly, DeepKa was improved significantly and outperforms other state-of-the-art methods, except for the constant-pH molecular dynamics, which was utilized to create PHMD549. More importantly, DeepKa reproduced experimental p orders of acidic dyads in five enzyme catalytic sites. Apart from structural proteins, DeepKa was found applicable to intrinsically disordered peptides. Further, in combination with solvent exposures, it is revealed that DeepKa offers the most accurate prediction under the challenging circumstance that hydrogen bonding or salt bridge interaction is partly compensated by desolvation for a buried side chain. Finally, our benchmark data qualify PHMD549 and EXP67S as the basis for future developments of protein p prediction tools driven by artificial intelligence. In addition, DeepKa built on PHMD549 has been proven an efficient protein p predictor and thus can be applied immediately to, for example, p database construction, protein design, drug discovery, and so on.

摘要

pH 通过对可离子化侧链的质子化和去质子化来调节许多生物过程中的蛋白质结构和相关功能，其中滴定平衡由 p 值决定。为了加速生命科学或工业蛋白质和药物设计中依赖 pH 的分子机制研究，快速准确的 p 值预测至关重要。在这里，我们提出了一个理论 pH 值数据集 PHMD549，该数据集已成功应用于四种不同的机器学习方法，包括我们之前工作中提出的 DeepKa。为了进行有效的比较，选择 EXP67S 作为测试集。令人鼓舞的是，DeepKa 得到了显著改进，优于其他最先进的方法，除了常 pH 值分子动力学，该方法用于创建 PHMD549。更重要的是，DeepKa 再现了五个酶催化位点中酸性偶联物的实验 pH 值顺序。除了结构蛋白，DeepKa 还被发现适用于固有无序肽。此外，结合溶剂暴露情况，结果表明，在部分由去溶剂化补偿氢键或盐桥相互作用的埋置侧链的挑战性环境下，DeepKa 提供了最准确的预测。最后，我们的基准数据将 PHMD549 和 EXP67S 作为未来人工智能驱动的蛋白质 pH 值预测工具的基础。此外，基于 PHMD549 构建的 DeepKa 已被证明是一种有效的蛋白质 pH 值预测器，因此可以立即应用于 pH 值数据库构建、蛋白质设计、药物发现等领域。

相似文献

Basis for Accurate Protein p Prediction with Machine Learning.

J Chem Inf Model. 2023 May 22;63(10):2936-2947. doi: 10.1021/acs.jcim.3c00254. Epub 2023 May 5.

DeepKa Web Server: High-Throughput Protein p Prediction.

J Chem Inf Model. 2024 Apr 22;64(8):2933-2940. doi: 10.1021/acs.jcim.3c02013. Epub 2024 Mar 26.

Comparative Performance of High-Throughput Methods for Protein p Predictions.

J Chem Inf Model. 2023 Aug 28;63(16):5169-5181. doi: 10.1021/acs.jcim.3c00165. Epub 2023 Aug 7.

GPU-Accelerated Implementation of Continuous Constant pH Molecular Dynamics in Amber: p Predictions with Single-pH Simulations.

J Chem Inf Model. 2019 Nov 25;59(11):4821-4832. doi: 10.1021/acs.jcim.9b00754. Epub 2019 Nov 14.

Coupled molecular dynamics and continuum electrostatic method to compute the ionization pKa's of proteins as a function of pH. Test on a large set of proteins.

J Biomol Struct Dyn. 2018 Feb;36(3):561-574. doi: 10.1080/07391102.2017.1288169. Epub 2017 Feb 24.

p Calculations with the Polarizable Drude Force Field and Poisson-Boltzmann Solvation Model.

J Chem Theory Comput. 2020 Jul 14;16(7):4655-4668. doi: 10.1021/acs.jctc.0c00111. Epub 2020 Jun 12.

Protein p Prediction with Machine Learning.

ACS Omega. 2021 Dec 7;6(50):34823-34831. doi: 10.1021/acsomega.1c05440. eCollection 2021 Dec 21.

GPU-Accelerated All-Atom Particle-Mesh Ewald Continuous Constant pH Molecular Dynamics in Amber.

J Chem Theory Comput. 2022 Dec 13;18(12):7510-7527. doi: 10.1021/acs.jctc.2c00586. Epub 2022 Nov 15.

Protein p Prediction by Tree-Based Machine Learning.

J Chem Theory Comput. 2022 Apr 12;18(4):2673-2686. doi: 10.1021/acs.jctc.1c01257. Epub 2022 Mar 15.

Reliable and Accurate Prediction of Single-Residue p Values through Free Energy Perturbation Calculations.

J Chem Theory Comput. 2022 Dec 13;18(12):7193-7204. doi: 10.1021/acs.jctc.2c00954. Epub 2022 Nov 16.

引用本文的文献

Structure-based rational design of covalent probes.

Commun Chem. 2025 Aug 12;8(1):242. doi: 10.1038/s42004-025-01606-y.

Harnessing computational technologies to facilitate antibody-drug conjugate development.

Nat Chem Biol. 2025 Jun 27. doi: 10.1038/s41589-025-01950-z.

Protonation State of Active-Site Histidines, Reaction Mechanism and Stereoselectivity in β-Alanine Synthase: A Computational Study.

J Phys Chem B. 2025 Jun 12;129(23):5664-5673. doi: 10.1021/acs.jpcb.5c00829. Epub 2025 Jun 3.

Protein Electrostatic Properties are Fine-Tuned Through Evolution.

Res Sq. 2025 Apr 28:rs.3.rs-6471091. doi: 10.21203/rs.3.rs-6471091/v1.

Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning.

J Chem Theory Comput. 2025 May 13;21(9):4830-4845. doi: 10.1021/acs.jctc.4c01682. Epub 2025 Apr 24.

KaMLs for Predicting Protein p Values and Ionization States: Are Trees All You Need?

J Chem Theory Comput. 2025 Feb 11;21(3):1446-1458. doi: 10.1021/acs.jctc.4c01602. Epub 2025 Jan 30.

Improved Structure-Based Histidine p Prediction for pH-Responsive Protein Design.

J Chem Inf Model. 2025 Feb 10;65(3):1560-1569. doi: 10.1021/acs.jcim.4c01957. Epub 2025 Jan 18.

Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning.

bioRxiv. 2024 Dec 12:2024.12.10.627714. doi: 10.1101/2024.12.10.627714.

KaMLs for Predicting Protein p Values and Ionization States: Are Trees All You Need?

bioRxiv. 2025 Jan 30:2024.11.09.622800. doi: 10.1101/2024.11.09.622800.

Open-Source Machine Learning in Computational Chemistry.

J Chem Inf Model. 2023 Aug 14;63(15):4505-4532. doi: 10.1021/acs.jcim.3c00643. Epub 2023 Jul 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的蛋白质 p 值准确预测。

Basis for Accurate Protein p Prediction with Machine Learning.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献