用于蛋白质二级结构预测的多类支持向量机

Multi-class support vector machines for protein secondary structure prediction.

作者信息

Nguyen Minh N, Rajapakse Jagath C

机构信息

School of Computer Engineering, Nanyang Technological University, Singapore.

出版信息

Genome Inform. 2003;14:218-27.

PMID:15706536

Abstract

The solution of binary classification problems using the Support Vector Machine (SVM) method has been well developed. Though multi-class classification is typically solved by combining several binary classifiers, recently, several multi-class methods that consider all classes at once have been proposed. However, these methods require resolving a much larger optimization problem and are applicable to small datasets. Three methods based on binary classifications: one-against-all (OAA), one-against-one (OAO), and directed acyclic graph (DAG), and two approaches for multi-class problem by solving one single optimization problem, are implemented to predict protein secondary structure. Our experiments indicate that multi-class SVM methods are more suitable for protein secondary structure (PSS) prediction than the other methods, including binary SVMs, because their capacity to solve an optimization problem in one step. Furthermore, in this paper, we argue that it is feasible to extend the prediction accuracy by adding a second-stage multi-class SVM to capture the contextual information among secondary structural elements and thereby further improving the accuracies. We demonstrate that two-stage SVMs perform better than single-stage SVM techniques for PSS prediction using two datasets and report a maximum accuracy of 79.5%.

摘要

使用支持向量机（SVM）方法解决二元分类问题已经得到了很好的发展。虽然多类分类通常通过组合多个二元分类器来解决，但最近已经提出了几种一次性考虑所有类别的多类方法。然而，这些方法需要解决一个大得多的优化问题，并且适用于小数据集。实现了基于二元分类的三种方法：一对多（OAA）、一对一（OAO）和有向无环图（DAG），以及通过解决单个优化问题来处理多类问题的两种方法，用于预测蛋白质二级结构。我们的实验表明，多类支持向量机方法比包括二元支持向量机在内的其他方法更适合蛋白质二级结构（PSS）预测，因为它们能够一步解决优化问题。此外，在本文中，我们认为通过添加第二阶段多类支持向量机来捕获二级结构元素之间的上下文信息，从而进一步提高准确率，扩展预测准确率是可行的。我们证明，使用两个数据集，两阶段支持向量机在PSS预测方面比单阶段支持向量机技术表现更好，并报告了79.5%的最高准确率。