Suppr超能文献

利用两亲性伪氨基酸组成预测酶亚家族类别。

Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes.

作者信息

Chou Kuo-Chen

机构信息

Gordon Life Science Institute, San Diego, CA 92130, USA.

出版信息

Bioinformatics. 2005 Jan 1;21(1):10-9. doi: 10.1093/bioinformatics/bth466. Epub 2004 Aug 12.

Abstract

MOTIVATION

With protein sequences entering into databanks at an explosive pace, the early determination of the family or subfamily class for a newly found enzyme molecule becomes important because this is directly related to the detailed information about which specific target it acts on, as well as to its catalytic process and biological function. Unfortunately, it is both time-consuming and costly to do so by experiments alone. In a previous study, the covariant-discriminant algorithm was introduced to identify the 16 subfamily classes of oxidoreductases. Although the results were quite encouraging, the entire prediction process was based on the amino acid composition alone without including any sequence-order information. Therefore, it is worthy of further investigation.

RESULTS

To incorporate the sequence-order effects into the predictor, the 'amphiphilic pseudo amino acid composition' is introduced to represent the statistical sample of a protein. The novel representation contains 20 + 2lambda discrete numbers: the first 20 numbers are the components of the conventional amino acid composition; the next 2lambda numbers are a set of correlation factors that reflect different hydrophobicity and hydrophilicity distribution patterns along a protein chain. Based on such a concept and formulation scheme, a new predictor is developed. It is shown by the self-consistency test, jackknife test and independent dataset tests that the success rates obtained by the new predictor are all significantly higher than those by the previous predictors. The significant enhancement in success rates also implies that the distribution of hydrophobicity and hydrophilicity of the amino acid residues along a protein chain plays a very important role to its structure and function.

摘要

动机

随着蛋白质序列以爆炸式速度进入数据库,对于新发现的酶分子,尽早确定其所属的家族或亚家族类别变得至关重要,因为这直接关系到它作用的具体靶标、催化过程及生物学功能的详细信息。不幸的是,仅通过实验来做到这一点既耗时又昂贵。在之前的一项研究中,引入了协变判别算法来识别氧化还原酶的16个亚家族类别。尽管结果相当令人鼓舞,但整个预测过程仅基于氨基酸组成,未包含任何序列顺序信息。因此,值得进一步研究。

结果

为了将序列顺序效应纳入预测器,引入了“两亲性伪氨基酸组成”来表示蛋白质的统计样本。这种新颖的表示包含20 + 2λ个离散数字:前20个数字是传统氨基酸组成的成分;接下来的2λ个数字是一组相关因子,反映了沿着蛋白质链不同的疏水性和亲水性分布模式。基于这样的概念和公式化方案,开发了一种新的预测器。通过自一致性检验、留一法检验和独立数据集检验表明,新预测器获得的成功率均显著高于先前的预测器。成功率的显著提高也意味着氨基酸残基沿着蛋白质链的疏水性和亲水性分布对其结构和功能起着非常重要的作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验