College of Mathematics and Information Technology, Hebei Normal University of Science and Technology, Qinhuangdao 066004, PR China.
College of Marine Life Science, Ocean University of China, Yushan Road, Qingdao 266003, PR China.
J Theor Biol. 2014 Mar 7;344:12-8. doi: 10.1016/j.jtbi.2013.11.021. Epub 2013 Dec 6.
Extracting good representation from protein sequence is fundamental for protein structural classes prediction tasks. In this paper, we propose a novel and powerful method to predict protein structural classes based on the predicted secondary structure information. At the feature extraction stage, a 13-dimensional feature vector is extracted to characterize general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Specially, four segment-level features are designed to elevate discriminative ability for proteins from the α/β and α+β classes. After the features are extracted, a multi-class non-linear support vector machine classifier is used to implement protein structural classes prediction. We report extensive experiments comparing the proposed method to the state-of-the-art in protein structural classes prediction on three widely used low-similarity benchmark datasets: FC699, 1189 and 640. Our method achieves competitive performance on prediction accuracies, especially for the overall prediction accuracies which have exceeded the best reported results on all of the three datasets.
从蛋白质序列中提取良好的表示对于蛋白质结构类预测任务至关重要。在本文中,我们提出了一种基于预测的二级结构信息预测蛋白质结构类的新方法。在特征提取阶段,提取了一个 13 维特征向量,以表征给定蛋白质序列中二级结构元素的一般内容和空间排列。特别地,设计了四个分段级特征,以提高对来自 α/β 和 α+β 类别的蛋白质的判别能力。特征提取后,使用多类非线性支持向量机分类器实现蛋白质结构类预测。我们报告了广泛的实验,将所提出的方法与蛋白质结构类预测的最新技术在三个广泛使用的低相似度基准数据集上进行了比较:FC699、1189 和 640。我们的方法在预测精度上表现出竞争力,尤其是在整体预测精度方面,在所有三个数据集上都超过了最佳报道结果。