Tao Peiying, Liu Taigang, Li Xiaowei, Chen Lanming
College of Food Science and Technology, Shanghai Ocean University, Shanghai, 201306, China.
Amino Acids. 2015 Mar;47(3):461-8. doi: 10.1007/s00726-014-1878-9. Epub 2015 Jan 13.
Knowledge of structural class plays an important role in understanding protein folding patterns. As a transitional stage in recognition of three-dimensional structure of a protein, protein structural class prediction is considered to be an important and challenging task. In this study, we firstly introduce a feature extraction technique which is based on tri-grams computed directly from position-specific scoring matrix (PSSM). A total of 8,000 features are extracted to represent a protein. Then, support vector machine-recursive feature elimination (SVM-RFE) is applied for feature selection and reduced features are input to a support vector machine (SVM) classifier to predict structural class of a given protein. To examine the effectiveness of our method, jackknife tests are performed on six widely used benchmark datasets, i.e., Z277, Z498, 1189, 25PDB, D640, and D1185. The overall accuracies of 97.1, 98.6, 92.5, 93.5, 94.2, and 95.9% are achieved on these datasets, respectively. Comparison of the proposed method with other prediction methods shows that our method is very promising to perform the prediction of protein structural class.
了解蛋白质结构类别在理解蛋白质折叠模式方面起着重要作用。作为识别蛋白质三维结构的一个过渡阶段,蛋白质结构类别预测被认为是一项重要且具有挑战性的任务。在本研究中,我们首先介绍一种基于直接从位置特异性得分矩阵(PSSM)计算得到的三元组的特征提取技术。共提取8000个特征来表示一个蛋白质。然后,应用支持向量机递归特征消除(SVM-RFE)进行特征选择,并将减少后的特征输入到支持向量机(SVM)分类器中以预测给定蛋白质的结构类别。为检验我们方法的有效性,在六个广泛使用的基准数据集,即Z277、Z498、1189、25PDB、D640和D1185上进行留一法测试。在这些数据集上分别获得了97.1%、98.6%、92.5%、93.5%、94.2%和95.9%的总体准确率。将所提出的方法与其他预测方法进行比较表明,我们的方法在进行蛋白质结构类别预测方面非常有前景。