蛋白质二级结构的非参数分类

Non-parametric classification of protein secondary structures.

作者信息

Zintzaras Elias, Brown Nigel P, Kowald Axel

机构信息

Department of Biomathematics, University of Thessaly School of Medicine, Greece.

出版信息

Comput Biol Med. 2006 Feb;36(2):145-56. doi: 10.1016/j.compbiomed.2004.10.001. Epub 2005 Jan 19.

DOI:10.1016/j.compbiomed.2004.10.001

PMID:16389074

Abstract

Proteins were classified into their families using a classification tree method which is based on the coefficient of variations of physico-chemical and geometrical properties of the secondary structures of proteins. The tree method uses as splitting criterion the increase in purity when a node is split into two subnodes and the size of the tree is controlled by a threshold level for the improvement of the apparent misclassification rate (AMR) of the tree after each splitting step. The classification tree method seems effective in reproducing similar structural groupings as the method of dynamic programming. For comparison, we also used another two methods: neural networks and support vector machines. We could show that the presented classification tree method performs better in classifying proteins into their families. The presented algorithm might be suitable for a rapid preliminary classification of proteins into their corresponding families.

摘要

蛋白质通过一种分类树方法被分类到各个家族中，该方法基于蛋白质二级结构的物理化学和几何性质的变异系数。树方法使用节点分裂为两个子节点时纯度的增加作为分裂标准，并且树的大小通过每个分裂步骤后树的表观误分类率（AMR）改善的阈值水平来控制。分类树方法在重现与动态规划方法相似的结构分组方面似乎很有效。为了进行比较，我们还使用了另外两种方法：神经网络和支持向量机。我们可以表明，所提出的分类树方法在将蛋白质分类到其家族中表现得更好。所提出的算法可能适用于将蛋白质快速初步分类到其相应的家族中。