Sun X-D, Huang R-B
College of Life Science and Biotechnology, Guangxi University, Nanning, Guangxi, China.
Amino Acids. 2006 Jun;30(4):469-75. doi: 10.1007/s00726-005-0239-0. Epub 2006 Apr 20.
The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly alpha, mainly beta, alpha-beta and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural classes which do not share any secondary structure such as alpha and beta elements could be classified with as high as 90% accuracy. The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements in common. Our study also shows that the dimensions of feature space 20(2) = 400 (for dipeptide) and 20(3) = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines may still need to be further improved in future investigation.
支持向量机作为一种机器学习方法,用于从CATH蛋白质结构数据库的拓扑层面预测四种结构类别,即主要为α结构、主要为β结构、α-β结构和fss结构。对于二元分类,任何两个不共享任何二级结构(如α和β元件)的结构类别分类准确率可达90%。然而,如果要分类的结构类别包含共同的结构元件,准确率将降至70%以下。我们的研究还表明,特征空间维度20(2)=400(用于二肽)和20(3)=8000(用于三肽)给出的预测准确率几乎相同。在这4种结构类别中,多类别分类的总体准确率约为52%,这表明支持向量机中的多类别分类技术在未来研究中可能仍需进一步改进。