Tao Fangting, Sun Jinyuan, Gao Pengyue, Gao George Fu, Wu Bian
Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China.
Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China.
Natl Sci Rev. 2025 Jun 10;12(6):nwaf231. doi: 10.1093/nsr/nwaf231. eCollection 2025 Jun.
Protein-protein interactions (PPIs) are essential for numerous biological functions and predicting binding affinity changes caused by mutations is crucial for understanding the impact of genetic variation and advancing protein engineering. Although machine-learning-based methods show promise in improving prediction accuracy, limited experimental data remain a significant bottleneck. In this study, we employed multitask learning and self-distillation to overcome the data limitation and improve the accuracy of protein-protein binding affinity prediction. By incorporating a mutation stability prediction task, our model achieved state-of-the-art accuracy on the SKEMPI dataset and was subsequently used to predict binding affinity changes for millions of mutations, generating an expanded dataset for self-distillation. Compared with prevalent methods, Pythia-PPI increased the Pearson's correlation between predictions and experimental data from 0.6447 to 0.7850 on the SKEMPI dataset and from 0.3654 to 0.6050 on the viral-receptor dataset. Experimental validation further confirmed its ability to identify high-affinity mutations on the CB6 antibody in complex with the severe acute respiratory syndrome coronavirus 2 prototype receptor binding domain, with the best single-point mutant among the top 10 predictions showing a 2-fold increase in binding affinity. These findings demonstrate that Pythia-PPI is a valuable tool for analysing the fitness landscape of PPIs. A web server for Pythia-PPI is available at https://pythiappi.wulab.xyz for easy access.
蛋白质-蛋白质相互作用(PPIs)对于众多生物学功能至关重要,预测由突变引起的结合亲和力变化对于理解遗传变异的影响和推进蛋白质工程至关重要。尽管基于机器学习的方法在提高预测准确性方面显示出前景,但有限的实验数据仍然是一个重大瓶颈。在本研究中,我们采用多任务学习和自蒸馏来克服数据限制并提高蛋白质-蛋白质结合亲和力预测的准确性。通过纳入突变稳定性预测任务,我们的模型在SKEMPI数据集上达到了当前最优的准确性,随后被用于预测数百万个突变的结合亲和力变化,生成了一个用于自蒸馏的扩展数据集。与流行方法相比,Pythia-PPI在SKEMPI数据集上使预测与实验数据之间的皮尔逊相关性从0.6447提高到0.7850,在病毒-受体数据集上从0.3654提高到0.6050。实验验证进一步证实了其在识别与严重急性呼吸综合征冠状病毒2原型受体结合域复合的CB6抗体上的高亲和力突变的能力,前10个预测中最佳的单点突变显示结合亲和力增加了2倍。这些发现表明Pythia-PPI是分析PPIs适应性景观的有价值工具。可通过https://pythiappi.wulab.xyz访问Pythia-PPI的网络服务器,以便于使用。