基于支持向量机，利用成对概率后缀树检测远距离蛋白质结构关系。

SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees.

作者信息

Oğul Hasan, Mumcuoğlu Erkan U

机构信息

Department of Computer Engineering, Başkent University, 06530 Ankara, Turkey.

出版信息

Comput Biol Chem. 2006 Aug;30(4):292-9. doi: 10.1016/j.compbiolchem.2006.05.001.

DOI:10.1016/j.compbiolchem.2006.05.001

PMID:16880118

Abstract

A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model.

摘要

定义了一种基于概率后缀树（PST）的新方法，用于远缘相关蛋白质序列的成对比较。在一个判别框架中采用了这个新定义，该框架在特征编码中使用成对序列相似性得分进行蛋白质分类。该框架使用支持向量机（SVM）来区分结构相似和不相似的示例。我们将这个新的判别系统称为SVM - PST，并已针对SCOP家族分类任务进行了测试，还与现有的判别方法SVM - BLAST和SVM - Pairwise进行了比较，后者分别使用BLAST相似性得分和基于动态规划的比对得分。结果表明，SVM - PST比SVM - BLAST更准确，并且与SVM - Pairwise具有竞争力。在计算效率方面，基于PST的比较比基于动态规划的比对要好得多。我们还将我们的结果与启发我们的基于原始家族的PST方法进行了比较。与基于家族的PST模型相比，本方法为蛋白质分类提供了一个明显更好的解决方案。

相似文献

SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees.

Comput Biol Chem. 2006 Aug;30(4):292-9. doi: 10.1016/j.compbiolchem.2006.05.001.

A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets.

Biosystems. 2007 Jan;87(1):75-81. doi: 10.1016/j.biosystems.2006.03.006. Epub 2006 Mar 28.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.

Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.

A feature vector integration approach for a generalized support vector machine pairwise homology algorithm.

Comput Biol Chem. 2008 Dec;32(6):458-61. doi: 10.1016/j.compbiolchem.2008.07.017. Epub 2008 Jul 16.

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.

Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.

SVM-BALSA: remote homology detection based on Bayesian sequence alignment.

Comput Biol Chem. 2005 Dec;29(6):440-3. doi: 10.1016/j.compbiolchem.2005.09.006. Epub 2005 Nov 10.

Protein classification based on text document classification techniques.

Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.

Application of latent semantic analysis to protein remote homology detection.

Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.

Prediction of protein subcellular localization.

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

引用本文的文献

LAF: Logic Alignment Free and its application to bacterial genomes classification.

BioData Min. 2015 Dec 8;8:39. doi: 10.1186/s13040-015-0073-1. eCollection 2015.

Template-based protein modeling: recent methodological advances.

Curr Top Med Chem. 2010;10(1):84-94. doi: 10.2174/156802610790232314.

An in silico strategy identified the target gene candidates regulated by dehydration responsive element binding proteins (DREBs) in Arabidopsis genome.

Plant Mol Biol. 2009 Jan;69(1-2):167-78. doi: 10.1007/s11103-008-9414-5. Epub 2008 Oct 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于支持向量机，利用成对概率后缀树检测远距离蛋白质结构关系。

SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献