Zhou Zhiyuan, Yin Yueming, Han Hao, Jia Yiping, Koh Jun Hong, Kong Adams Wai-Kin, Mu Yuguang
School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
Institute for Digital Molecular Analytics and Science (IDMxS), Nanyang Technological University, 636921, Singapore.
J Chem Inf Model. 2024 Dec 9;64(23):8796-8808. doi: 10.1021/acs.jcim.4c01850. Epub 2024 Nov 18.
Protein-protein interactions (PPIs) are crucial for understanding biological processes and disease mechanisms, contributing significantly to advances in protein engineering and drug discovery. The accurate determination of binding affinities, essential for decoding PPIs, faces challenges due to the substantial time and financial costs involved in experimental and theoretical methods. This situation underscores the urgent need for more effective and precise methodologies for predicting binding affinity. Despite the abundance of research on PPI modeling, the field of quantitative binding affinity prediction remains underexplored, mainly due to a lack of comprehensive data. This study seeks to address these needs by manually curating pairwise interaction labels on available 3D structures of protein complexes, with experimentally determined binding affinities, creating the largest data set for structure-based pairwise protein interaction with binding affinity to date. Subsequently, we introduce ProAffinity-GNN, a novel deep learning framework using protein language model and graph neural network (GNN) to improve the accuracy of prediction of structure-based protein-protein binding affinities. The evaluation results across several benchmark test sets and an additional case study demonstrate that ProAffinity-GNN not only outperforms existing models in terms of accuracy but also shows strong generalization capabilities.
蛋白质-蛋白质相互作用(PPIs)对于理解生物过程和疾病机制至关重要,对蛋白质工程和药物发现的进展有重大贡献。结合亲和力的准确测定是解读PPIs的关键,但由于实验和理论方法涉及大量时间和资金成本,面临诸多挑战。这种情况凸显了对更有效、精确的结合亲和力预测方法的迫切需求。尽管对PPI建模已有大量研究,但定量结合亲和力预测领域仍未得到充分探索,主要原因是缺乏全面的数据。本研究旨在通过人工整理具有实验测定结合亲和力的蛋白质复合物可用3D结构上的成对相互作用标签来满足这些需求,创建了迄今为止最大的基于结构的具有结合亲和力的成对蛋白质相互作用数据集。随后,我们引入了ProAffinity-GNN,这是一种新颖的深度学习框架,它使用蛋白质语言模型和图神经网络(GNN)来提高基于结构的蛋白质-蛋白质结合亲和力预测的准确性。在多个基准测试集上的评估结果以及一个额外的案例研究表明,ProAffinity-GNN不仅在准确性方面优于现有模型,还展现出强大的泛化能力。