通过整合一系列蛋白质生物学特征，利用伪氨基酸组成预测蛋白酶家族。

Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features.

作者信息

Hu Lele, Zheng Lulu, Wang Zhiwen, Li Bing, Liu Lei

机构信息

Institute of Systems Biology, Shanghai University, Shanghai 200444, China.

出版信息

Protein Pept Lett. 2011 Jun;18(6):552-8. doi: 10.2174/092986611795222795.

DOI:10.2174/092986611795222795

PMID:21271978

Abstract

Proteases are essential to most biological processes though they themselves remain intact during the processes. In this research, a computational approach was developed for predicting the families of proteases based on their sequences. According to the concept of pseudo amino acid composition, in order to catch the essential patterns for the sequences of proteases, the sample of a protein was formulated by a series of its biological features. There were a total of 132 biological features, which were sourced from various biochemical and physicochemical properties of the constituent amino acids. The importance of these features to the prediction is rated by Maximum Relevance Minimum Redundancy algorithm and then the Incremental Feature Selection was applied to select an optimal feature set, which was used to construct a predictor through the nearest neighbor algorithm. As a demonstration, the overall success rate by the jackknife test in identifying proteases among their seven families was 92.74%. It was revealed by further analysis on the optimal feature set that the secondary structure and amino acid composition play the key roles for the classification, which is quite consistent with some previous findings. The promising results imply that the predictor as presented in this paper may become a useful tool for studying proteases.

摘要

蛋白酶对大多数生物过程至关重要，尽管它们在这些过程中自身保持完整。在本研究中，开发了一种基于序列预测蛋白酶家族的计算方法。根据伪氨基酸组成的概念，为捕捉蛋白酶序列的基本模式，蛋白质样本由其一系列生物学特征构成。共有132个生物学特征，这些特征源自组成氨基酸的各种生化和物理化学性质。通过最大相关最小冗余算法评估这些特征对预测的重要性，然后应用增量特征选择来选择最优特征集，该特征集用于通过最近邻算法构建预测器。作为例证，留一法检验在识别七个家族的蛋白酶时的总体成功率为92.74%。对最优特征集的进一步分析表明，二级结构和氨基酸组成在分类中起关键作用，这与一些先前的发现相当一致。这些有前景的结果表明，本文提出的预测器可能成为研究蛋白酶的有用工具。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过整合一系列蛋白质生物学特征，利用伪氨基酸组成预测蛋白酶家族。

Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

通过整合一系列蛋白质生物学特征，利用伪氨基酸组成预测蛋白酶家族。

Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features.

作者信息

机构信息

出版信息

相似文献

引用本文的文献