xCAPT5：使用深度和广泛的多核池卷积神经网络与蛋白质语言模型进行蛋白质-蛋白质相互作用预测。

xCAPT5: protein-protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model.

机构信息

Faculty of Information Technology, VNU University of Engineering and Technology, 144 Xuan Thuy, Hanoi, 10000, Vietnam.

Faculty of Biology, VNU University of Science, 334 Nguyen Trai, Hanoi, 10000, Vietnam.

出版信息

BMC Bioinformatics. 2024 Mar 10;25(1):106. doi: 10.1186/s12859-024-05725-6.

DOI:10.1186/s12859-024-05725-6

PMID:38461247

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10924985/

Abstract

BACKGROUND

Predicting protein-protein interactions (PPIs) from sequence data is a key challenge in computational biology. While various computational methods have been proposed, the utilization of sequence embeddings from protein language models, which contain diverse information, including structural, evolutionary, and functional aspects, has not been fully exploited. Additionally, there is a significant need for a comprehensive neural network capable of efficiently extracting these multifaceted representations.

RESULTS

Addressing this gap, we propose xCAPT5, a novel hybrid classifier that uniquely leverages the T5-XL-UniRef50 protein large language model for generating rich amino acid embeddings from protein sequences. The core of xCAPT5 is a multi-kernel deep convolutional siamese neural network, which effectively captures intricate interaction features at both micro and macro levels, integrated with the XGBoost algorithm, enhancing PPIs classification performance. By concatenating max and average pooling features in a depth-wise manner, xCAPT5 effectively learns crucial features with low computational cost.

CONCLUSION

This study represents one of the initial efforts to extract informative amino acid embeddings from a large protein language model using a deep and wide convolutional network. Experimental results show that xCAPT5 outperforms recent state-of-the-art methods in binary PPI prediction, excelling in cross-validation on several benchmark datasets and demonstrating robust generalization across intra-species, cross-species, inter-species, and stringent similarity contexts.

摘要

背景

从序列数据中预测蛋白质-蛋白质相互作用（PPIs）是计算生物学中的一个关键挑战。虽然已经提出了各种计算方法，但尚未充分利用包含结构、进化和功能等多方面信息的蛋白质语言模型序列嵌入。此外，需要一种全面的神经网络来有效地提取这些多方面的表示。

结果

为了解决这一差距，我们提出了 xCAPT5，这是一种新颖的混合分类器，它独特地利用了 T5-XL-UniRef50 蛋白质大型语言模型，从蛋白质序列中生成丰富的氨基酸嵌入。xCAPT5 的核心是一个多内核深度卷积孪生神经网络，它有效地捕获了微观和宏观层面上复杂的相互作用特征，并与 XGBoost 算法集成，提高了 PPIs 分类性能。通过以深度方式串联最大池化和平均池化特征，xCAPT5 可以有效地学习具有低计算成本的关键特征。

结论

这项研究是使用深度和广泛的卷积网络从大型蛋白质语言模型中提取信息丰富的氨基酸嵌入的初步尝试之一。实验结果表明，xCAPT5 在二项 PPI 预测方面优于最新的最先进方法，在几个基准数据集的交叉验证中表现出色，并在种内、种间、跨物种和严格相似性上下文中具有稳健的泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8481/10924985/64fdfd1f662e/12859_2024_5725_Fig1_HTML.jpg

相似文献

xCAPT5: protein-protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model.

BMC Bioinformatics. 2024 Mar 10;25(1):106. doi: 10.1186/s12859-024-05725-6.

Multifaceted protein-protein interaction prediction based on Siamese residual RCNN.

Bioinformatics. 2019 Jul 15;35(14):i305-i314. doi: 10.1093/bioinformatics/btz328.

Improving protein-protein interaction prediction using protein language model and protein network features.

Anal Biochem. 2024 Oct;693:115550. doi: 10.1016/j.ab.2024.115550. Epub 2024 Apr 26.

MaTPIP: A deep-learning architecture with eXplainable AI for sequence-driven, feature mixed protein-protein interaction prediction.

Comput Methods Programs Biomed. 2024 Feb;244:107955. doi: 10.1016/j.cmpb.2023.107955. Epub 2023 Nov 30.

MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction.

Comput Biol Med. 2023 Feb;153:106526. doi: 10.1016/j.compbiomed.2022.106526. Epub 2023 Jan 3.

Modeling aspects of the language of life through transfer-learning protein sequences.

BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

An analysis of protein language model embeddings for fold prediction.

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.

Long-distance dependency combined multi-hop graph neural networks for protein-protein interactions prediction.

BMC Bioinformatics. 2022 Dec 5;23(1):521. doi: 10.1186/s12859-022-05062-6.

Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses.

Comput Biol Chem. 2022 Dec;101:107755. doi: 10.1016/j.compbiolchem.2022.107755. Epub 2022 Aug 13.

SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction.

BMC Genomics. 2022 Jun 27;23(1):474. doi: 10.1186/s12864-022-08687-2.

引用本文的文献

ESM2_AMP: an interpretable framework for protein-protein interactions prediction and biological mechanism discovery.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf434.

Sliding Window Interaction Grammar (SWING): a generalized interaction language model for peptide and protein interactions.

Nat Methods. 2025 Jul 28. doi: 10.1038/s41592-025-02723-1.

StackGlyEmbed: prediction of N-linked glycosylation sites using protein language models.

Bioinform Adv. 2025 Jun 28;5(1):vbaf146. doi: 10.1093/bioadv/vbaf146. eCollection 2025.

Large Context, Deeper Insights: Harnessing Large Language Models for Advancing Protein-Protein Interaction Analysis.

Methods Mol Biol. 2025;2941:243-267. doi: 10.1007/978-1-0716-4623-6_15.

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

Feature fusion with attributed deepwalk for protein-protein interaction prediction.

Sci Rep. 2025 Apr 10;15(1):12255. doi: 10.1038/s41598-025-96510-9.

Sliding Window INteraction Grammar (SWING): a generalized interaction language model for peptide and protein interactions.

bioRxiv. 2024 May 4:2024.05.01.592062. doi: 10.1101/2024.05.01.592062.

本文引用的文献

HNSPPI: a hybrid computational model combing network and sequence information for predicting protein-protein interaction.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad261.

Graph-BERT and language model-based framework for protein-protein interaction identification.

Sci Rep. 2023 Apr 6;13(1):5663. doi: 10.1038/s41598-023-31612-w.

MARPPI: boosting prediction of protein-protein interactions with multi-scale architecture residual network.

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac524.

Topsy-Turvy: integrating a global view into sequence-based PPI prediction.

Bioinformatics. 2022 Jun 24;38(Suppl 1):i264-i272. doi: 10.1093/bioinformatics/btac258.

Learning spatial structures of proteins improves protein-protein interaction prediction.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab558.

DeepTrio: a ternary prediction system for protein-protein interaction using mask multiple parallel convolutional neural networks.

Bioinformatics. 2022 Jan 12;38(3):694-702. doi: 10.1093/bioinformatics/btab737.

D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.

Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9.

Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction.

Bioinformatics. 2021 Dec 11;37(24):4771-4778. doi: 10.1093/bioinformatics/btab533.

Improved prediction of protein-protein interaction using a hybrid of functional-link Siamese neural network and gradient boosting machines.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab255.

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.

Comput Biol Med. 2020 Aug;123:103899. doi: 10.1016/j.compbiomed.2020.103899. Epub 2020 Jul 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

xCAPT5：使用深度和广泛的多核池卷积神经网络与蛋白质语言模型进行蛋白质-蛋白质相互作用预测。

xCAPT5: protein-protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献