School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
School of Artificial Intelligence, Jilin University, Changchun 130012, China.
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae603.
Protein-protein interactions (PPIs) are essential for the regulation and facilitation of virtually all biological processes. Computational tools, particularly those based on deep learning, are preferred for the efficient prediction of PPIs. Despite recent progress, two challenges remain unresolved: (i) the imbalanced nature of PPI characteristics is often ignored and (ii) there exists a high computational cost associated with capturing long-range dependencies within protein data, typically exhibiting quadratic complexity relative to the length of the protein sequence.
Here, we propose an anti-symmetric graph learning model, BaPPI, for the balanced prediction of PPIs and extrapolation of the involved patterns in PPI network. In BaPPI, the contextualized information of protein data is efficiently handled by an attention-free mechanism formed by recurrent convolution operator. The anti-symmetric graph convolutional network is employed to model the uneven distribution within PPI networks, aiming to learn a more robust and balanced representation of the relationships between proteins. Ultimately, the model is updated using asymmetric loss. The experimental results on classical baseline datasets demonstrate that BaPPI outperforms four state-of-the-art PPI prediction methods. In terms of Micro-F1, BaPPI exceeds the second-best method by 6.5% on SHS27K and 5.3% on SHS148K. Further analysis of the generalization ability and patterns of predicted PPIs also demonstrates our model's generalizability and robustness to the imbalanced nature of PPI datasets.
The source code of this work is publicly available at https://github.com/ttan6729/BaPPI.
蛋白质-蛋白质相互作用(PPIs)对于几乎所有生物过程的调节和促进都是至关重要的。计算工具,特别是基于深度学习的工具,是高效预测 PPIs 的首选。尽管最近取得了进展,但仍有两个挑战尚未解决:(i)PPIs 特征的不平衡性质经常被忽略,(ii)在捕获蛋白质数据中的长程依赖关系时存在高计算成本,通常相对于蛋白质序列的长度呈二次复杂度。
在这里,我们提出了一种反对称图学习模型 BaPPI,用于平衡预测 PPIs 并外推 PPI 网络中涉及的模式。在 BaPPI 中,通过由递归卷积运算符形成的无注意力机制来有效地处理蛋白质数据的上下文信息。使用反对称图卷积网络来模拟 PPI 网络中的不均匀分布,旨在学习更稳健和平衡的蛋白质之间关系表示。最终,使用不对称损失来更新模型。在经典基准数据集上的实验结果表明,BaPPI 优于四种最先进的 PPI 预测方法。在 Micro-F1 方面,BaPPI 在 SHS27K 上比第二好的方法高出 6.5%,在 SHS148K 上高出 5.3%。对预测 PPIs 的泛化能力和模式的进一步分析也表明,我们的模型对 PPI 数据集的不平衡性质具有通用性和稳健性。
这项工作的源代码可在 https://github.com/ttan6729/BaPPI 上公开获取。