Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad376.
The advanced language models have enabled us to recognize protein-protein interactions (PPIs) and interaction sites using protein sequences or structures. Here, we trained the MindSpore ProteinBERT (MP-BERT) model, a Bidirectional Encoder Representation from Transformers, using protein pairs as inputs, making it suitable for identifying PPIs and their respective interaction sites. The pretrained model (MP-BERT) was fine-tuned as MPB-PPI (MP-BERT on PPI) and demonstrated its superiority over the state-of-the-art models on diverse benchmark datasets for predicting PPIs. Moreover, the model's capability to recognize PPIs among various organisms was evaluated on multiple organisms. An amalgamated organism model was designed, exhibiting a high level of generalization across the majority of organisms and attaining an accuracy of 92.65%. The model was also customized to predict interaction site propensity by fine-tuning it with PPI site data as MPB-PPISP. Our method facilitates the prediction of both PPIs and their interaction sites, thereby illustrating the potency of transfer learning in dealing with the protein pair task.
先进的语言模型使我们能够使用蛋白质序列或结构识别蛋白质-蛋白质相互作用(PPIs)和相互作用位点。在这里,我们使用蛋白质对作为输入来训练 MindSpore ProteinBERT(MP-BERT)模型,这是一种来自 Transformer 的双向编码器表示,使其适合识别 PPI 及其各自的相互作用位点。预训练的模型(MP-BERT)被微调为 MPB-PPI(PPI 上的 MP-BERT),并在用于预测 PPI 的各种基准数据集上展示了优于最先进模型的优越性。此外,还评估了模型在多个生物体中识别不同生物体之间 PPI 的能力。设计了一个混合生物体模型,在大多数生物体中表现出高度的泛化能力,达到 92.65%的准确率。还通过使用 PPI 站点数据对其进行微调,将模型定制为预测相互作用站点倾向,即 MPB-PPISP。我们的方法促进了 PPI 和它们的相互作用位点的预测,从而说明了迁移学习在处理蛋白质对任务中的强大作用。