Suppr超能文献

DeepProSite:使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。

DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.

机构信息

State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China.

Peng Cheng Laboratory, Shenzhen 518055, China.

出版信息

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.

Abstract

MOTIVATION

Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information.

RESULTS

In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite.

AVAILABILITY AND IMPLEMENTATION

The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.

摘要

动机

确定蛋白质的功能位点,如蛋白质、肽或其他生物成分的结合位点,对于理解相关的生物过程和药物设计至关重要。然而,现有的基于序列的方法预测准确性有限,因为它们仅考虑序列相邻的上下文特征,缺乏结构信息。

结果

本研究提出了 DeepProSite,这是一种新的识别蛋白质结合位点的框架,利用了蛋白质结构和序列信息。DeepProSite 首先使用 ESMFold 生成蛋白质结构,使用预训练的语言模型生成序列表示。然后,它使用图转换器并将结合位点预测表述为图节点分类。在预测蛋白质-蛋白质/肽结合位点时,DeepProSite 在大多数指标上都优于最先进的基于序列和结构的方法。此外,与竞争的基于结构的预测方法相比,DeepProSite 在预测未结合结构时保持了性能。DeepProSite 还扩展到了核酸和其他配体结合位点的预测,验证了其泛化能力。最后,建立了一个用于预测多种残基类型的在线服务器,作为所提出的 DeepProSite 的实现。

可用性和实现

数据集和源代码可在 https://github.com/WeiLab-Biology/DeepProSite 访问。所提出的 DeepProSite 可在 https://inner.wei-group.net/DeepProSite/ 访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f34e/10723037/f70ff8f65a96/btad718f1.jpg

相似文献

1
DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.
3
AlphaFold2-aware protein-DNA binding site prediction using graph transformer.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab564.
4
Identifying B-cell epitopes using AlphaFold2 predicted structures and pretrained language model.
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad187.
5
Structure-based prediction of protein- peptide binding regions using Random Forest.
Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.
8
Predicting protein-peptide binding residues via interpretable deep learning.
Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.

引用本文的文献

2
Bag-of-words is competitive with sum-of-embeddings language-inspired representations on protein inference.
PLoS One. 2025 Aug 6;20(8):e0325531. doi: 10.1371/journal.pone.0325531. eCollection 2025.
3
EUP: Enhanced cross-species prediction of ubiquitination sites via a conditional variational autoencoder network based on ESM2.
PLoS Comput Biol. 2025 Jul 16;21(7):e1013268. doi: 10.1371/journal.pcbi.1013268. eCollection 2025 Jul.
4
Region-based segmental swapping of homologous enzymes for higher cadaverine production.
Microb Cell Fact. 2025 May 22;24(1):120. doi: 10.1186/s12934-025-02739-4.
5
HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf079.
7
Deep Learning Combined with Quantitative Structure‒Activity Relationship Accelerates De Novo Design of Antifungal Peptides.
Adv Sci (Weinh). 2025 Apr;12(13):e2412488. doi: 10.1002/advs.202412488. Epub 2025 Feb 8.

本文引用的文献

1
PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces.
Nat Commun. 2023 Apr 18;14(1):2175. doi: 10.1038/s41467-023-37701-8.
2
Evolutionary-scale prediction of atomic-level protein structure with a language model.
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
3
RGN: Residue-Based Graph Attention and Convolutional Network for Protein-Protein Interaction Site Prediction.
J Chem Inf Model. 2022 Dec 12;62(23):5961-5974. doi: 10.1021/acs.jcim.2c01092. Epub 2022 Nov 18.
4
ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction.
Nat Methods. 2022 Jun;19(6):730-739. doi: 10.1038/s41592-022-01490-7. Epub 2022 May 30.
5
Predicting protein-peptide binding residues via interpretable deep learning.
Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.
6
AlphaFold2-aware protein-DNA binding site prediction using graph transformer.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab564.
7
Structure-aware protein-protein interaction site prediction using deep graph convolutional network.
Bioinformatics. 2021 Dec 22;38(1):125-132. doi: 10.1093/bioinformatics/btab643.
8
Accurate prediction of protein structures and interactions using a three-track neural network.
Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.
9
Highly accurate protein structure prediction with AlphaFold.
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
10
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验