School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China.
Biochimie. 2010 Oct;92(10):1330-4. doi: 10.1016/j.biochi.2010.06.013. Epub 2010 Jun 23.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods.
结构类别的知识在理解蛋白质折叠模式方面起着重要作用。在这项研究中,提出了一种简单而强大的计算方法,该方法结合支持向量机和 PSI-BLAST 轮廓,用于预测低相似度序列的蛋白质结构类别。通过从轮廓中提取氨基酸组成和二肽组成,将 PSI-BLAST 轮廓中的进化信息编码转换为一系列固定长度的特征向量。然后,将得到的向量输入支持向量机分类器,以预测蛋白质结构类别。为了评估所提出方法的性能,在两个广泛使用的基准数据集 1189(包含 1092 个蛋白质)和 25PDB(包含 1673 个蛋白质)上进行了自举交叉验证测试,序列相似度分别低于 40%和 25%。对于 1189 和 25PDB 数据集,整体准确率分别达到 70.7%和 72.9%。与其他方法的比较表明,我们的方法非常有前途,可以预测蛋白质结构类别,特别是对于低相似度数据集,并且可能至少对现有方法起到重要的补充作用。