DeepProSite：使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。

DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.

机构信息

State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China.

Peng Cheng Laboratory, Shenzhen 518055, China.

出版信息

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.

DOI:10.1093/bioinformatics/btad718

PMID:38015872

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10723037/

Abstract

MOTIVATION

Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information.

RESULTS

In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite.

AVAILABILITY AND IMPLEMENTATION

The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.

摘要

动机

确定蛋白质的功能位点，如蛋白质、肽或其他生物成分的结合位点，对于理解相关的生物过程和药物设计至关重要。然而，现有的基于序列的方法预测准确性有限，因为它们仅考虑序列相邻的上下文特征，缺乏结构信息。

结果

本研究提出了 DeepProSite，这是一种新的识别蛋白质结合位点的框架，利用了蛋白质结构和序列信息。DeepProSite 首先使用 ESMFold 生成蛋白质结构，使用预训练的语言模型生成序列表示。然后，它使用图转换器并将结合位点预测表述为图节点分类。在预测蛋白质-蛋白质/肽结合位点时，DeepProSite 在大多数指标上都优于最先进的基于序列和结构的方法。此外，与竞争的基于结构的预测方法相比，DeepProSite 在预测未结合结构时保持了性能。DeepProSite 还扩展到了核酸和其他配体结合位点的预测，验证了其泛化能力。最后，建立了一个用于预测多种残基类型的在线服务器，作为所提出的 DeepProSite 的实现。

可用性和实现

数据集和源代码可在 https://github.com/WeiLab-Biology/DeepProSite 访问。所提出的 DeepProSite 可在 https://inner.wei-group.net/DeepProSite/ 访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f34e/10723037/f70ff8f65a96/btad718f1.jpg

相似文献

DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.DeepProSite：使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.

Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures.通过对语言模型预测结构进行几何图学习来准确识别核酸结合位点。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad360.

AlphaFold2-aware protein-DNA binding site prediction using graph transformer.基于图变换的 AlphaFold2 感知蛋白-DNA 结合位点预测。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab564.

Identifying B-cell epitopes using AlphaFold2 predicted structures and pretrained language model.利用 AlphaFold2 预测结构和预训练语言模型鉴定 B 细胞表位。

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad187.

Structure-based prediction of protein- peptide binding regions using Random Forest.基于结构的随机森林预测蛋白肽结合区域。

Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.

Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning.通过预训练语言模型和多任务学习从蛋白质序列中进行无对齐金属离子结合位点预测。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac444.

EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.EquiPNAS：利用基于蛋白质语言模型的等变深度图神经网络提高蛋白质-核酸结合位点预测。

Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039.

Predicting protein-peptide binding residues via interpretable deep learning.通过可解释的深度学习预测蛋白质-肽结合残基

Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.

sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure.sAMPpred-GAT：基于图注意力网络和预测肽结构的抗菌肽预测。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac715.

CELA-MFP: a contrast-enhanced and label-adaptive framework for multi-functional therapeutic peptides prediction.CELA-MFP：一种用于多功能治疗性肽预测的对比度增强和标签自适应框架。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae348.

引用本文的文献

Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.利用注意力图引导的图卷积网络结合蛋白质语言嵌入和物理化学信息预测核酸结合位点。

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf457.

Bag-of-words is competitive with sum-of-embeddings language-inspired representations on protein inference.词袋模型在蛋白质推理方面与基于语言启发的词嵌入求和表示法具有竞争力。

PLoS One. 2025 Aug 6;20(8):e0325531. doi: 10.1371/journal.pone.0325531. eCollection 2025.

EUP: Enhanced cross-species prediction of ubiquitination sites via a conditional variational autoencoder network based on ESM2.EUP：基于ESM2的条件变分自编码器网络增强泛素化位点的跨物种预测

PLoS Comput Biol. 2025 Jul 16;21(7):e1013268. doi: 10.1371/journal.pcbi.1013268. eCollection 2025 Jul.

Region-based segmental swapping of homologous enzymes for higher cadaverine production.基于区域的同源酶片段交换以提高尸胺产量。

Microb Cell Fact. 2025 May 22;24(1):120. doi: 10.1186/s12934-025-02739-4.

HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction.HSSPPI：用于蛋白质-蛋白质相互作用预测的分层和空间序列建模

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf079.

Improving Identification of Drug-Target Binding Sites Based on Structures of Targets Using Residual Graph Transformer Network.基于靶点结构利用残差图变换器网络改进药物-靶点结合位点的识别

Biomolecules. 2025 Feb 3;15(2):221. doi: 10.3390/biom15020221.

Deep Learning Combined with Quantitative Structure‒Activity Relationship Accelerates De Novo Design of Antifungal Peptides.深度学习结合定量构效关系加速抗真菌肽的从头设计。

Adv Sci (Weinh). 2025 Apr;12(13):e2412488. doi: 10.1002/advs.202412488. Epub 2025 Feb 8.

DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction.DeepTGIN：一种使用Transformer和图同构网络进行蛋白质-配体结合亲和力预测的新型混合多模态方法。

J Cheminform. 2024 Dec 29;16(1):147. doi: 10.1186/s13321-024-00938-6.

dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era.dbAMP 3.0：大流行后时代抗菌肽活性和结构注释的更新资源。

Nucleic Acids Res. 2025 Jan 6;53(D1):D364-D376. doi: 10.1093/nar/gkae1019.

Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning.基于带有对比学习的预训练蛋白质语言模型的蛋白质-小分子结合位点预测

J Cheminform. 2024 Nov 6;16(1):125. doi: 10.1186/s13321-024-00920-2.

本文引用的文献

PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces.PeSTo：用于准确预测蛋白质结合界面的无参几何深度学习。

Nat Commun. 2023 Apr 18;14(1):2175. doi: 10.1038/s41467-023-37701-8.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

RGN: Residue-Based Graph Attention and Convolutional Network for Protein-Protein Interaction Site Prediction.RGN：用于蛋白质-蛋白质相互作用位点预测的基于残基的图注意力和卷积网络

J Chem Inf Model. 2022 Dec 12;62(23):5961-5974. doi: 10.1021/acs.jcim.2c01092. Epub 2022 Nov 18.

ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction.ScanNet：一种用于基于结构的蛋白质结合位点预测的可解释几何深度学习模型。

Nat Methods. 2022 Jun;19(6):730-739. doi: 10.1038/s41592-022-01490-7. Epub 2022 May 30.

Predicting protein-peptide binding residues via interpretable deep learning.通过可解释的深度学习预测蛋白质-肽结合残基

Bioinformatics. 2022 Jun 27;38(13):3351-3360. doi: 10.1093/bioinformatics/btac352.

AlphaFold2-aware protein-DNA binding site prediction using graph transformer.基于图变换的 AlphaFold2 感知蛋白-DNA 结合位点预测。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab564.

Structure-aware protein-protein interaction site prediction using deep graph convolutional network.使用深度图卷积网络进行结构感知的蛋白质-蛋白质相互作用位点预测。

Bioinformatics. 2021 Dec 22;38(1):125-132. doi: 10.1093/bioinformatics/btab643.

Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。

Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DeepProSite：使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。

DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献