Lebedenko Olga O, Polovinkin Mikhail S, Kazovskaia Anastasiia A, Skrynnikov Nikolai R
Laboratory of Biomolecular NMR, St. Petersburg State University, St. Petersburg, Russia.
Faculty of Mathematics & Computer Science, St. Petersburg State University, St. Petersburg, Russia.
Proteins. 2025 Sep;93(9):1498-1506. doi: 10.1002/prot.26821. Epub 2025 Mar 21.
In this communication, we introduce a new structure-based affinity predictor for protein-protein complexes. This predictor, dubbed PCANN (Protein Complex Affinity by Neural Network), uses the ESM-2 language model to encode the information about protein binding interfaces and graph attention network (GAT) to parlay this information into predictions. In the tests employing two previously unused literature-extracted datasets, PCANN performed better than the best of the publicly available predictors, BindPPI, with mean absolute error (MAE) of 1.3 versus 1.4 kcal/mol. Further progress in the development of predictors using deep learning models is faced with two problems: (i) the amount of experimental data available to train and test new predictors is limited and (ii) the available data are often not very accurate and lack internal consistency with respect to measurement conditions. These issues can be potentially addressed through an AI-leveraged literature search followed by careful human curation and by introducing additional parameters to account for variations in experimental conditions.
在本通讯中,我们介绍了一种用于蛋白质-蛋白质复合物的基于结构的新型亲和力预测器。这种预测器被称为PCANN(基于神经网络的蛋白质复合物亲和力预测器),它使用ESM-2语言模型对蛋白质结合界面的信息进行编码,并利用图注意力网络(GAT)将这些信息转化为预测结果。在使用两个之前未使用过的从文献中提取的数据集进行的测试中,PCANN的表现优于公开可用的最佳预测器BindPPI,平均绝对误差(MAE)分别为1.3和1.4千卡/摩尔。使用深度学习模型开发预测器的进一步进展面临两个问题:(i)可用于训练和测试新预测器的实验数据量有限;(ii)可用数据通常不太准确,并且在测量条件方面缺乏内部一致性。这些问题可以通过利用人工智能进行文献搜索,随后进行仔细的人工整理,并引入额外参数以考虑实验条件的变化来潜在地解决。