Li Xiaowei, Liu Taigang, Tao Peiying, Wang Chunhua, Chen Lanming
College of Food Science & Technology, Shanghai Ocean University, Shanghai 201306, China.
College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2.
Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets.
结构类别表征蛋白质或其结构域的整体折叠类型。近年来,人们提出了许多方法来提高蛋白质结构类别的预测准确性,但对于低相似性序列来说,这仍然是一个挑战。在本研究中,我们引入了一种基于位置特异性得分矩阵(PSSM)的自协方差(ACC)变换的特征提取技术来表示蛋白质序列。然后采用支持向量机递归特征消除(SVM-RFE)根据特征的重要性选择前K个特征,并将这些特征输入支持向量机(SVM)进行预测。使用留一法在三个低相似性数据集(即D640、1189和25PDB)上对所提出的方法进行性能评估。通过这种方法,在这三个数据集上分别达到了97.2%、96.2%和93.3%的总体准确率,高于大多数现有方法。这表明所提出的方法可以作为一种非常经济高效的工具来预测蛋白质结构类别,特别是对于低相似性数据集。