State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China.
Peng Cheng Laboratory, Shenzhen 518055, China.
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.
Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information.
In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite.
The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
确定蛋白质的功能位点,如蛋白质、肽或其他生物成分的结合位点,对于理解相关的生物过程和药物设计至关重要。然而,现有的基于序列的方法预测准确性有限,因为它们仅考虑序列相邻的上下文特征,缺乏结构信息。
本研究提出了 DeepProSite,这是一种新的识别蛋白质结合位点的框架,利用了蛋白质结构和序列信息。DeepProSite 首先使用 ESMFold 生成蛋白质结构,使用预训练的语言模型生成序列表示。然后,它使用图转换器并将结合位点预测表述为图节点分类。在预测蛋白质-蛋白质/肽结合位点时,DeepProSite 在大多数指标上都优于最先进的基于序列和结构的方法。此外,与竞争的基于结构的预测方法相比,DeepProSite 在预测未结合结构时保持了性能。DeepProSite 还扩展到了核酸和其他配体结合位点的预测,验证了其泛化能力。最后,建立了一个用于预测多种残基类型的在线服务器,作为所提出的 DeepProSite 的实现。
数据集和源代码可在 https://github.com/WeiLab-Biology/DeepProSite 访问。所提出的 DeepProSite 可在 https://inner.wei-group.net/DeepProSite/ 访问。