Oğul Hasan, Mumcuoğlu Erkan U
Department of Computer Engineering, Başkent University, 06530 Ankara, Turkey.
Comput Biol Chem. 2006 Aug;30(4):292-9. doi: 10.1016/j.compbiolchem.2006.05.001.
A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model.
定义了一种基于概率后缀树(PST)的新方法,用于远缘相关蛋白质序列的成对比较。在一个判别框架中采用了这个新定义,该框架在特征编码中使用成对序列相似性得分进行蛋白质分类。该框架使用支持向量机(SVM)来区分结构相似和不相似的示例。我们将这个新的判别系统称为SVM - PST,并已针对SCOP家族分类任务进行了测试,还与现有的判别方法SVM - BLAST和SVM - Pairwise进行了比较,后者分别使用BLAST相似性得分和基于动态规划的比对得分。结果表明,SVM - PST比SVM - BLAST更准确,并且与SVM - Pairwise具有竞争力。在计算效率方面,基于PST的比较比基于动态规划的比对要好得多。我们还将我们的结果与启发我们的基于原始家族的PST方法进行了比较。与基于家族的PST模型相比,本方法为蛋白质分类提供了一个明显更好的解决方案。