Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, 200237, China.
Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, No. 3663, Zhongshan North Road, Putuo District, Shanghai, 200062, China.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae408.
During the drug discovery and design process, the acid-base dissociation constant (pKa) of a molecule is critically emphasized due to its crucial role in influencing the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties and biological activity. However, the experimental determination of pKa values is often laborious and complex. Moreover, existing prediction methods exhibit limitations in both the quantity and quality of the training data, as well as in their capacity to handle the complex structural and physicochemical properties of compounds, consequently impeding accuracy and generalization. Therefore, developing a method that can quickly and accurately predict molecular pKa values will to some extent help the structural modification of molecules, and thus assist the development process of new drugs. In this study, we developed a cutting-edge pKa prediction model named GR-pKa (Graph Retention pKa), leveraging a message-passing neural network and employing a multi-fidelity learning strategy to accurately predict molecular pKa values. The GR-pKa model incorporates five quantum mechanical properties related to molecular thermodynamics and dynamics as key features to characterize molecules. Notably, we originally introduced the novel retention mechanism into the message-passing phase, which significantly improves the model's ability to capture and update molecular information. Our GR-pKa model outperforms several state-of-the-art models in predicting macro-pKa values, achieving impressive results with a low mean absolute error of 0.490 and root mean square error of 0.588, and a high R2 of 0.937 on the SAMPL7 dataset.
在药物发现和设计过程中,由于分子的酸碱离解常数(pKa)对影响 ADMET(吸收、分布、代谢、排泄和毒性)性质和生物活性起着至关重要的作用,因此它被高度重视。然而,pKa 值的实验测定通常既费力又复杂。此外,现有的预测方法在训练数据的数量和质量以及处理化合物复杂结构和物理化学性质的能力方面都存在局限性,从而阻碍了准确性和泛化能力。因此,开发一种能够快速准确地预测分子 pKa 值的方法,在某种程度上有助于分子的结构修饰,从而辅助新药的开发过程。在本研究中,我们开发了一种名为 GR-pKa(Graph Retention pKa)的尖端 pKa 预测模型,该模型利用消息传递神经网络并采用多保真度学习策略来准确预测分子的 pKa 值。GR-pKa 模型将与分子热力学和动力学相关的五个量子力学性质作为关键特征纳入其中,以表征分子。值得注意的是,我们最初将新颖的保留机制引入消息传递阶段,这显著提高了模型捕捉和更新分子信息的能力。我们的 GR-pKa 模型在预测宏观 pKa 值方面优于几个最先进的模型,在 SAMPL7 数据集上取得了令人印象深刻的结果,平均绝对误差为 0.490,均方根误差为 0.588,R2 为 0.937。