Cai Yu-Dong, Chou Kuo-Chen
Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
J Proteome Res. 2005 May-Jun;4(3):967-71. doi: 10.1021/pr0500399.
As a continuous effort to use the sequence approach to identify enzymatic function at a deeper level, investigations are extended from the main enzyme classes (Protein Sci. 2004, 13, 2857-2863) to their subclasses. This is indispensable if we wish to understand the molecular mechanism of an enzyme at a deeper level. For each of the 6 main enzyme classes (i.e., oxidoreductase, transferase, hydrolase, lyase, isomerase, and ligase), a subclass training dataset is constructed. To reduce homologous bias, a stringent cutoff was imposed that all the entries included in the datasets have less than 40% sequence identity to each other. To catch the core feature that is intimately related to the biological function, the sample of a protein is represented by hybridizing the functional domain composition and pseudo amino acid composition. On the basis of such a hybridization representation, the FunD-PseAA predictor is established. It is demonstrated by the jackknife cross-validation tests that the overall success rate in identifying the 21 subclasses of oxidoreductases is above 86%, and the corresponding rates in identifying the subclasses of the other 5 main enzyme classes are 94-97%. The high success rates imply that the FunD-PseAA predictor may become a useful tool in bioinformatics and proteomics of the post-genomic era.
作为持续深入运用序列方法识别酶功能的努力,研究从主要酶类(《蛋白质科学》2004年,第13卷,2857 - 2863页)扩展到其亚类。如果我们希望更深入地理解酶的分子机制,这是必不可少的。对于6种主要酶类(即氧化还原酶、转移酶、水解酶、裂合酶、异构酶和连接酶)中的每一种,都构建了一个亚类训练数据集。为了减少同源性偏差,设定了严格的截止标准,即数据集中包含的所有条目彼此之间的序列同一性小于40%。为了捕捉与生物学功能密切相关的核心特征,通过将功能域组成和伪氨基酸组成进行杂交来表示蛋白质样本。基于这种杂交表示,建立了FunD - PseAA预测器。刀切法交叉验证测试表明,识别氧化还原酶21个亚类的总体成功率高于86%,识别其他5种主要酶类亚类的相应成功率为94% - 97%。高成功率意味着FunD - PseAA预测器可能成为后基因组时代生物信息学和蛋白质组学中的一个有用工具。