• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过可解释的深度学习预测蛋白质-肽结合残基

Predicting protein-peptide binding residues via interpretable deep learning.

作者信息

Wang Ruheng, Jin Junru, Zou Quan, Nakai Kenta, Wei Leyi

机构信息

School of Software, Shandong University, Jinan 250101, China.

Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China.

出版信息

Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.

DOI:10.1093/bioinformatics/btac352
PMID:35604077
Abstract

SUMMARY

Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein-peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of feature engineering. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structures and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Interestingly, the interpretable analysis of our model highlights the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/.

AVAILABILITY AND IMPLEMENTATION

https://github.com/Ruheng-W/PepBCL.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

识别蛋白质-肽结合残基对于理解蛋白质功能机制和探索药物发现至关重要。尽管已经开发了几种计算方法,但其中大多数高度依赖第三方工具或复杂的数据预处理来进行特征设计,容易导致计算效率低下且预测性能不佳。为了解决这些局限性,我们提出了PepBCL,这是一种基于新型BERT(来自Transformer的双向编码器表示)的对比学习框架,仅基于蛋白质序列预测蛋白质-肽结合残基。PepBCL是一个独立于特征工程的端到端预测模型。具体而言,我们引入了一个经过良好预训练的蛋白质语言模型,该模型可以自动提取和学习与蛋白质结构和功能相关的蛋白质序列的高潜在表示。此外,我们设计了一种新颖的对比学习模块,以优化不平衡数据集中结合残基的特征表示。我们证明,在基准比较下,我们提出的方法显著优于现有方法,并实现了更稳健的性能。此外,我们发现通过整合传统特征和我们学习到的特征可以进一步提高性能。有趣的是,我们模型的可解释分析突出了基于深度学习的蛋白质语言模型在捕获肽结合残基的保守和非保守序列特征方面的灵活性和适应性。最后,为了便于使用我们的方法,我们建立了一个在线预测平台作为所提出的PepBCL的实现,现在可在http://server.wei-group.net/PepBCL/上获得。

可用性和实现方式

https://github.com/Ruheng-W/PepBCL。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Predicting protein-peptide binding residues via interpretable deep learning.通过可解释的深度学习预测蛋白质-肽结合残基
Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.
2
Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism.基于蛋白质语言模型和交叉注意力机制的蛋白质-肽结合残基预测。
Anal Biochem. 2024 Nov;694:115637. doi: 10.1016/j.ab.2024.115637. Epub 2024 Aug 8.
3
iDNA-ABT: advanced deep learning model for detecting DNA methylation with adaptive features and transductive information maximization.iDNA-ABT:具有自适应特征和转导信息最大化的先进深度学习模型,用于检测 DNA 甲基化。
Bioinformatics. 2021 Dec 11;37(24):4603-4610. doi: 10.1093/bioinformatics/btab677.
4
BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.BERT-TFBS:一种基于迁移学习的用于预测转录因子结合位点的新型基于BERT的模型。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.
5
Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides.基于多尺度注意力架构学习嵌入特征,以提高抗癌肽的预测性能。
Bioinformatics. 2021 Dec 11;37(24):4684-4693. doi: 10.1093/bioinformatics/btab560.
6
SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.SOFB 是一种全面的集成深度学习方法,用于阐明和描述蛋白质-核酸结合残基。
Commun Biol. 2024 Jun 3;7(1):679. doi: 10.1038/s42003-024-06332-0.
7
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability.PD-BertEDL:一种使用 BERT 和多元表示的集成深度学习方法,用于预测肽可检测性。
Int J Mol Sci. 2022 Oct 16;23(20):12385. doi: 10.3390/ijms232012385.
8
ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning.ToxIBTL:基于信息瓶颈和迁移学习的肽毒性预测
Bioinformatics. 2022 Mar 4;38(6):1514-1524. doi: 10.1093/bioinformatics/btac006.
9
BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides.BERT4Bitter:一种基于变换器双向编码器表征(BERT)的模型,用于改进苦味肽的预测。
Bioinformatics. 2021 Sep 9;37(17):2556-2562. doi: 10.1093/bioinformatics/btab133.
10
DP-site: A dual deep learning-based method for protein-peptide interaction site prediction.DP-site:一种基于双重深度学习的蛋白质-肽相互作用位点预测方法。
Methods. 2024 Sep;229:17-29. doi: 10.1016/j.ymeth.2024.06.001. Epub 2024 Jun 12.

引用本文的文献

1
A Computational Perspective to Intermolecular Interactions and the Role of the Solvent on Regulating Protein Properties.分子间相互作用的计算视角以及溶剂在调节蛋白质性质中的作用
Chem Rev. 2025 Aug 13;125(15):7023-7056. doi: 10.1021/acs.chemrev.4c00807. Epub 2025 Jul 28.
2
SpatConv Enables the Accurate Prediction of Protein Binding Sites by a Pretrained Protein Language Model and an Interpretable Bio-spatial Convolution.空间卷积通过预训练的蛋白质语言模型和可解释的生物空间卷积实现对蛋白质结合位点的准确预测。
Research (Wash D C). 2025 Jul 8;8:0773. doi: 10.34133/research.0773. eCollection 2025.
3
Prediction of Protein-Peptide Binding Sites Using PepBCL.
使用PepBCL预测蛋白质-肽结合位点
Methods Mol Biol. 2025;2941:269-278. doi: 10.1007/978-1-0716-4623-6_16.
4
AMCL: supervised contrastive learning with hard sample mining for multi-functional therapeutic peptide prediction.AMCL:用于多功能治疗性肽预测的带难样本挖掘的监督对比学习
BMC Biol. 2025 Jul 1;23(1):170. doi: 10.1186/s12915-025-02273-0.
5
ConsAMPHemo: A computational framework for predicting hemolysis of antimicrobial peptides based on machine learning approaches.ConsAMPHemo:一种基于机器学习方法预测抗菌肽溶血作用的计算框架。
Protein Sci. 2025 Jul;34(7):e70087. doi: 10.1002/pro.70087.
6
Recent progress and future challenges in structure-based protein-protein interaction prediction.基于结构的蛋白质-蛋白质相互作用预测的最新进展与未来挑战
Mol Ther. 2025 May 7;33(5):2252-2268. doi: 10.1016/j.ymthe.2025.04.003. Epub 2025 Apr 6.
7
Deep Learning for Predicting Biomolecular Binding Sites of Proteins.用于预测蛋白质生物分子结合位点的深度学习
Research (Wash D C). 2025 Feb 24;8:0615. doi: 10.34133/research.0615. eCollection 2025.
8
Machine learning for antimicrobial peptide identification and design.用于抗菌肽鉴定与设计的机器学习
Nat Rev Bioeng. 2024 May;2(5):392-407. doi: 10.1038/s44222-024-00152-x. Epub 2024 Feb 26.
9
iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning.iMFP-LG:使用蛋白质语言模型和基于图的深度学习识别新型多功能肽。
Genomics Proteomics Bioinformatics. 2025 Jan 15;22(6). doi: 10.1093/gpbjnl/qzae084.
10
PepCA: Unveiling protein-peptide interaction sites with a multi-input neural network model.PepCA:使用多输入神经网络模型揭示蛋白质-肽相互作用位点
iScience. 2024 Aug 30;27(10):110850. doi: 10.1016/j.isci.2024.110850. eCollection 2024 Oct 18.