支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

作者信息

Melvin Iain, Ie Eugene, Kuang Rui, Weston Jason, Stafford William Noble, Leslie Christina

机构信息

NEC Laboratories of America, Princeton, NJ, USA.

出版信息

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

DOI:10.1186/1471-2105-8-S4-S2

PMID:17570145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1892081/

Abstract

BACKGROUND

Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community.

RESULTS

We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at http://svm-fold.c2b2.columbia.edu. Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider.

CONCLUSION

By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition.

摘要

背景

从氨基酸序列预测蛋白质的结构类别是计算生物学中的一个基本问题。最近的许多工作都集中在开发用于蛋白质序列的新表示方法，即字符串核，以用于支持向量机（SVM）分类器。然而，虽然这些方法中的一些在二元蛋白质分类问题上展现出了最先进的性能，即在区分特定蛋白质类别与所有其他类别方面，但这些研究中很少有涉及多类超家族或折叠识别的实际问题。此外，生物信息学领域中基于支持向量机的蛋白质分类可用的软件工具和系统非常有限。

结果

我们提出了一种新的基于支持向量机的多类蛋白质折叠和超家族识别系统及网络服务器，称为SVM-Fold，可在http://svm-fold.c2b2.columbia.edu上找到。我们的系统使用了一种针对序列概况的最先进字符串核的高效实现方法，称为概况核，其基础特征表示是不精确匹配的k-mer频率直方图。我们还采用了一种新颖的机器学习方法来解决将氨基酸序列分类到许多已知蛋白质结构类别之一的困难多类问题。训练用于识别单个结构类别的二元一对其余支持向量机分类器产生的预测分数不可比，因此标准的“一对所有”分类效果不佳。此外，针对蛋白质结构层次不同级别的类别的支持向量机可能会做出有用的预测，但一对所有方法不会尝试组合这些多个预测。为了解决这些问题，我们的方法学习一对其余分类器之间的相对权重，并对用于多类预测的蛋白质结构层次信息进行编码。在基于SCOP数据库的大规模基准测试结果中，我们的代码加权方法在远程同源设置下的超家族和折叠预测以及折叠识别问题上，相对于标准的一对所有方法有显著改进。此外，在我们考虑的每个结构分类问题上，我们的代码权重学习算法在预测准确性方面明显优于基于PSI-BLAST的最近邻方法。

结论

通过将最先进的支持向量机核方法与新颖的多类算法相结合，SVM-Fold系统实现了高效且准确的蛋白质折叠和超家族识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/054e/1892081/3b45b5f67b53/1471-2105-8-S4-S2-1.jpg

相似文献

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Profile-based string kernels for remote homology detection and motif extraction.

J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.

Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.

Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.

Profile-based string kernels for remote homology detection and motif extraction.

Proc IEEE Comput Syst Bioinform Conf. 2004:152-60. doi: 10.1109/csb.2004.1332428.

Protein homology detection using string alignment kernels.

Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.

Application of latent semantic analysis to protein remote homology detection.

Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.

Mismatch string kernels for discriminative protein classification.

Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.

SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.

Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.

A structural alignment kernel for protein structures.

Bioinformatics. 2007 May 1;23(9):1090-8. doi: 10.1093/bioinformatics/btl642. Epub 2007 Jan 18.

引用本文的文献

PSSNet-An Accurate Super-Secondary Structure for Protein Segmentation.

Int J Mol Sci. 2022 Nov 26;23(23):14813. doi: 10.3390/ijms232314813.

Computer Vision and Machine Learning-Based Gait Pattern Recognition for Flat Fall Prediction.

Sensors (Basel). 2022 Oct 19;22(20):7960. doi: 10.3390/s22207960.

Evaluation of Three Machine Learning Algorithms for the Automatic Classification of EMG Patterns in Gait Disorders.

Front Neurol. 2021 May 21;12:666458. doi: 10.3389/fneur.2021.666458. eCollection 2021.

Identification of Motor and Mental Imagery EEG in Two and Multiclass Subject-Dependent Tasks Using Successive Decomposition Index.

Sensors (Basel). 2020 Sep 16;20(18):5283. doi: 10.3390/s20185283.

Network-based protein structural classification.

R Soc Open Sci. 2020 Jun 3;7(6):191461. doi: 10.1098/rsos.191461. eCollection 2020 Jun.

A Novel Geometry-Based Approach to Infer Protein Interface Similarity.

Sci Rep. 2018 May 29;8(1):8192. doi: 10.1038/s41598-018-26497-z.

The role of ontologies in biological and biomedical research: a functional perspective.

Brief Bioinform. 2015 Nov;16(6):1069-80. doi: 10.1093/bib/bbv011. Epub 2015 Apr 10.

Accelerating the Original Profile Kernel.

PLoS One. 2013 Jun 18;8(6):e68459. doi: 10.1371/journal.pone.0068459. Print 2013.

Simrank: Rapid and sensitive general-purpose k-mer search tool.

BMC Ecol. 2011 Apr 27;11:11. doi: 10.1186/1472-6785-11-11.

Automatic structure classification of small proteins using random forest.

BMC Bioinformatics. 2010 Jul 1;11:364. doi: 10.1186/1471-2105-11-364.

本文引用的文献

Profile-based string kernels for remote homology detection and motif extraction.

Proc IEEE Comput Syst Bioinform Conf. 2004:152-60. doi: 10.1109/csb.2004.1332428.

Profile-based string kernels for remote homology detection and motif extraction.

J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.

Mismatch string kernels for discriminative protein classification.

Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.

Protein homology detection using string alignment kernels.

Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.

The perceptron: a probabilistic model for information storage and organization in the brain.

Psychol Rev. 1958 Nov;65(6):386-408. doi: 10.1037/h0042519.

Remote homology detection: a motif based approach.

Bioinformatics. 2003;19 Suppl 1:i26-33. doi: 10.1093/bioinformatics/btg1002.

A discriminative framework for detecting remote protein homologies.

J Comput Biol. 2000 Feb-Apr;7(1-2):95-114. doi: 10.1089/10665270050081405.

The ASTRAL compendium for protein structure and sequence analysis.

Nucleic Acids Res. 2000 Jan 1;28(1):254-6. doi: 10.1093/nar/28.1.254.

Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.

J Mol Biol. 1998 Dec 11;284(4):1201-10. doi: 10.1006/jmbi.1998.2221.

CATH--a hierarchic classification of protein domain structures.

Structure. 1997 Aug 15;5(8):1093-108. doi: 10.1016/s0969-2126(97)00260-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献