Leslie Christina, Eskin Eleazar, Noble William Stafford
Department of Computer Science, Columbia University, New York, NY 10027, USA.
Pac Symp Biocomput. 2002:564-75.
We introduce a new sequence-similarity kernel, the spectrum kernel, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. Our kernel is conceptually simple and efficient to compute and, in experiments on the SCOP database, performs well in comparison with state-of-the-art methods for homology detection. Moreover, our method produces an SVM classifier that allows linear time classification of test sequences. Our experiments provide evidence that string-based kernels, in conjunction with SVMs, could offer a viable and computationally efficient alternative to other methods of protein classification and homology detection.
我们引入了一种新的序列相似性核——谱核,用于支持向量机(SVM),以一种判别式方法解决蛋白质分类问题。我们的核在概念上简单且计算高效,并且在SCOP数据库上的实验中,与用于同源性检测的现有最先进方法相比表现良好。此外,我们的方法产生了一个支持向量机分类器,它允许对测试序列进行线性时间分类。我们的实验证明,基于字符串的核与支持向量机相结合,可以为蛋白质分类和同源性检测的其他方法提供一种可行且计算高效的替代方案。