基于结构的催化残基鉴定。

Structure-based identification of catalytic residues.

机构信息

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel.

出版信息

Proteins. 2011 Jun;79(6):1952-63. doi: 10.1002/prot.23020. Epub 2011 Apr 12.

DOI:10.1002/prot.23020

PMID:21491495

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3092797/

Abstract

The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.

摘要

催化残基的鉴定是酶功能特征分析的重要步骤。我们提出了一种纯粹基于结构的方法来解决这个问题，这种方法是受到基于进化的方法难以注释结构基因组学靶标（这些靶标在数据库中只有很少或没有同源物）的启发。我们的方法将最先进的支持向量机（SVM）分类器与新的结构特征相结合，通过空间平均和 Z 评分来增强结构线索。特别关注由于酶中的催化残基数量相对于非催化残基数量压倒性地多而导致的类别不平衡问题。通过以下三种方法解决这个问题：（1）通过优化分类器，最大化考虑催化和非催化残基分类中的 I 型和 II 型错误的性能标准；（2）在 SVM 训练之前对非催化残基进行欠采样；（3）在 SVM 训练过程中，对学习催化残基的错误比学习非催化残基的错误进行更多的惩罚。在四个酶数据集上进行测试，其中一个是我们专门设计的来模拟结构基因组学场景的数据集，另外三个是之前评估过的数据集，我们基于结构的分类器在性能上从不逊于类似的基于结构的分类器，也可与使用结构和进化特征的分类器相媲美。除了对催化残基鉴定性能的评估外，我们还对三个蛋白质进行了详细的案例研究。该分析表明，许多假阳性预测可能对应于结合位点和其他功能残基。一个实现该方法的网络服务器、我们自己设计的数据库以及程序的源代码都可以在 http://www.cs.bgu.ac.il/∼meshi/functionPrediction 上公开获取。

相似文献

Structure-based identification of catalytic residues.基于结构的催化残基鉴定。

Proteins. 2011 Jun;79(6):1952-63. doi: 10.1002/prot.23020. Epub 2011 Apr 12.

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测：现状评估。

BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.

Automatic prediction of catalytic residues by modeling residue structural neighborhood.基于残基结构邻域建模自动预测催化残基。

BMC Bioinformatics. 2010 Mar 3;11:115. doi: 10.1186/1471-2105-11-115.

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties.使用支持向量机结合选定的蛋白质序列和结构特性预测催化残基。

BMC Bioinformatics. 2006 Jun 21;7:312. doi: 10.1186/1471-2105-7-312.

Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines.利用THEMATICS和支持向量机提高蛋白质活性位点预测性能。

Protein Sci. 2008 Feb;17(2):333-41. doi: 10.1110/ps.073213608. Epub 2007 Dec 20.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

SCREEN: A Graph-based Contrastive Learning Tool to Infer Catalytic Residues and Assess Enzyme Mutations.SCREEN：一种基于图的对比学习工具，用于推断催化残基和评估酶突变

Genomics Proteomics Bioinformatics. 2025 Jan 15;22(6). doi: 10.1093/gpbjnl/qzae094.

Identification of catalytic residues from protein structure using support vector machine with sequence and structural features.利用具有序列和结构特征的支持向量机从蛋白质结构中鉴定催化残基。

Biochem Biophys Res Commun. 2008 Mar 14;367(3):630-4. doi: 10.1016/j.bbrc.2008.01.038. Epub 2008 Jan 17.

Accurate sequence-based prediction of catalytic residues.基于序列的催化残基精确预测。

Bioinformatics. 2008 Oct 15;24(20):2329-38. doi: 10.1093/bioinformatics/btn433. Epub 2008 Aug 18.

Functional Sites Induce Long-Range Evolutionary Constraints in Enzymes.功能位点在酶中诱导长程进化限制。

PLoS Biol. 2016 May 3;14(5):e1002452. doi: 10.1371/journal.pbio.1002452. eCollection 2016 May.

引用本文的文献

EzSEA: an interactive web interface for enzyme sequence evolution analysis.EzSEA：用于酶序列进化分析的交互式网络界面。

Bioinform Adv. 2025 May 20;5(1):vbaf118. doi: 10.1093/bioadv/vbaf118. eCollection 2025.

Targeting the ubiquitin-conjugating enzyme E2D4 for cancer drug discovery-a structure-based approach.以泛素结合酶E2D4为靶点进行癌症药物研发——一种基于结构的方法。

J Chem Biol. 2016 Dec 24;10(2):51-67. doi: 10.1007/s12154-016-0164-6. eCollection 2017 Apr.

A simple extension to the CMASA method for the prediction of catalytic residues in the presence of single point mutations.一种对CMASA方法的简单扩展，用于在存在单点突变的情况下预测催化残基。

PLoS One. 2014 Sep 30;9(9):e108513. doi: 10.1371/journal.pone.0108513. eCollection 2014.

Exploring the composition of protein-ligand binding sites on a large scale.大规模探索蛋白质-配体结合位点的组成。

PLoS Comput Biol. 2013;9(11):e1003321. doi: 10.1371/journal.pcbi.1003321. Epub 2013 Nov 21.

Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues.利用一种新颖的特征识别催化残基，该特征综合了残基的微环境和几何位置特性。

PLoS One. 2012;7(7):e41370. doi: 10.1371/journal.pone.0041370. Epub 2012 Jul 19.

CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure.CLIPS-1D：分析多重序列比对，推断残基位置在催化、配体结合或蛋白质结构中的作用。

BMC Bioinformatics. 2012 Apr 5;13:55. doi: 10.1186/1471-2105-13-55.

本文引用的文献

Biochem Biophys Res Commun. 2008 Mar 14;367(3):630-4. doi: 10.1016/j.bbrc.2008.01.038. Epub 2008 Jan 17.

Identification and investigation of ORFans in the viral world.病毒世界中孤儿基因（ORFans）的鉴定与研究。

BMC Genomics. 2008 Jan 19;9:24. doi: 10.1186/1471-2164-9-24.

Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines.利用THEMATICS和支持向量机提高蛋白质活性位点预测性能。

Protein Sci. 2008 Feb;17(2):333-41. doi: 10.1110/ps.073213608. Epub 2007 Dec 20.

Localizing frustration in native proteins and protein assemblies.在天然蛋白质和蛋白质组装体中定位挫折感。

Proc Natl Acad Sci U S A. 2007 Dec 11;104(50):19819-24. doi: 10.1073/pnas.0709915104. Epub 2007 Dec 5.

Relating destabilizing regions to known functional sites in proteins.将不稳定区域与蛋白质中已知的功能位点相关联。

BMC Bioinformatics. 2007 Apr 30;8:141. doi: 10.1186/1471-2105-8-141.

Selective prediction of interaction sites in protein structures with THEMATICS.利用THEMATICS对蛋白质结构中的相互作用位点进行选择性预测。

BMC Bioinformatics. 2007 Apr 9;8:119. doi: 10.1186/1471-2105-8-119.

Evaluation of features for catalytic residue prediction in novel folds.新型折叠中催化残基预测特征的评估。

Protein Sci. 2007 Feb;16(2):216-26. doi: 10.1110/ps.062523907. Epub 2006 Dec 22.

FireDB--a database of functionally important residues from proteins of known structure.FireDB——一个来自已知结构蛋白质的功能重要残基数据库。

Nucleic Acids Res. 2007 Jan;35(Database issue):D219-23. doi: 10.1093/nar/gkl897. Epub 2006 Nov 28.

BMC Bioinformatics. 2006 Jun 21;7:312. doi: 10.1186/1471-2105-7-312.

Dynamical contributions to enzyme catalysis: critical tests of a popular hypothesis.对酶催化的动力学贡献：一个流行假说的关键检验

Chem Rev. 2006 May;106(5):1737-56. doi: 10.1021/cr040427e.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。