Suppr超能文献

通过杂交基因产物组成和伪氨基酸组成预测酶家族类别。

Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition.

作者信息

Cai Yu-Dong, Zhou Guo-Ping, Chou Kuo-Chen

机构信息

Biomolecular Sciences Department, University of Manchester Institute of Science & Technology, PO Box 88, Manchester, M60 1QD, UK

出版信息

J Theor Biol. 2005 May 7;234(1):145-9. doi: 10.1016/j.jtbi.2004.11.017. Epub 2005 Jan 26.

Abstract

A new method has been developed to predict the enzymatic attribute of proteins by hybridizing the gene product composition and pseudo amino acid composition. As a demonstration, a working dataset was generated with a cutoff of 60% sequence identity to avoid redundancy and bias in statistical prediction. The dataset thus constructed contains 39989 protein sequences, of which 27469 are non-enzymes and 12520 enzymes that were further classified into 6 enzyme family classes according to their 6 main EC (Enzyme Commission) numbers (2314 are oxidoreductases, 3653 transferases, 3246 hydrolases, 1307 lyases, 676 isomerases, and 1324 ligases). The overall success rate by the jackknife test for the identification between enzyme and non-enzyme was 94%, and that for the identification among the 6 enzyme family classes was 98%. It is anticipated that, with the rapid increase of protein sequences entering into databanks, the current method will become a useful automated tool in identifying the enzymatic attribute of a newly found protein sequence.

摘要

一种通过将基因产物组成和伪氨基酸组成相结合来预测蛋白质酶学属性的新方法已经被开发出来。作为一个示范,生成了一个工作数据集,其序列同一性截止值为60%,以避免统计预测中的冗余和偏差。这样构建的数据集包含39989个蛋白质序列,其中27469个是非酶蛋白,12520个是酶蛋白,这些酶蛋白根据其6个主要的酶委员会(EC)编号进一步分为6个酶家族类别(2314个是氧化还原酶,3653个是转移酶,3246个是水解酶,1307个是裂合酶,676个是异构酶,1324个是连接酶)。通过留一法检验对酶和非酶进行鉴别的总体成功率为94%,对6个酶家族类别进行鉴别的成功率为98%。预计随着进入数据库的蛋白质序列迅速增加,当前方法将成为识别新发现蛋白质序列酶学属性的一个有用的自动化工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验