Suppr超能文献

基于一种从蛋白质一级结构提取特征的新方法对多类同型寡聚体进行分类

[Classification of multi-class homo-oligomer based on a novel method of feature extraction from protein primary structure].

作者信息

Zhang Shaowu, Pan Quan, Zhao Chunhui, Cheng Yongmei

机构信息

School of Automatic Control, Northwestern Polytechnic University, Xi'an 710072, China.

出版信息

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2007 Aug;24(4):721-6.

Abstract

A novel method of feature extraction from protein primary structure has been proposed and applied to classify the protein homodimer, homotrimer, homotetramer and homohexamer, i. e. one protein sequence can be represented by a feature vector composed of amino acid compositions and a set of weighted auto-correlation function factors of amino acid residue index. As a result, high classification accuracies are obtained. For example, with the same support vector machine (SVM), the total accuracies of QIANA, AIANB, MEEJ, ROBB and SNEP sets based on this novel feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in Jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82 percent points respectively higher than that of COMP set based on the conventional method composed of amino acid compositions. With the same QIANA set, the total accuracy of SVM is 77.63%, which is 16.29 percent points higher than that of covariant discriminant algorithm. These results show: (1) The novel feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches buried in the interfaces of associated subunits; (2) SVM can be referred as a powerful computational tool for classifying the homo-oligomers of proteins.

摘要

一种从蛋白质一级结构中提取特征的新方法已被提出,并应用于对蛋白质同二聚体、同三聚体、同四聚体和同六聚体进行分类,即一个蛋白质序列可以由一个由氨基酸组成和一组氨基酸残基索引的加权自相关函数因子组成的特征向量来表示。结果,获得了较高的分类准确率。例如,使用相同的支持向量机(SVM),在留一法测试中,基于这种新特征提取方法的QIANA、AIANB、MEEJ、ROBB和SNEP集的总准确率分别为77.63%、77.16%、76.46%、76.70%和75.06%,分别比基于由氨基酸组成的传统方法的COMP集高出6.39、5.92、5.22、5.46和3.82个百分点。对于相同的QIANA集,SVM的总准确率为77.63%,比协变判别算法高出16.29个百分点。这些结果表明:(1)这种新的特征提取方法是有效可行的,基于该方法的特征向量可能包含更多的蛋白质四级结构信息,并且似乎捕捉到了关于埋藏在相关亚基界面中的表面斑块中残基组成和疏水性的基本信息;(2)SVM可被视为一种强大的计算工具,用于对蛋白质的同寡聚体进行分类。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验