Zhang Lichao, Zhao Xiqiang, Kong Liang
College of Marine Life Science, Ocean University of China, Yushan Road, Qingdao 266003, PR China.
College of Mathematical Science, Ocean University of China, Songling Road, Qingdao 266100, PR China.
J Theor Biol. 2014 Aug 21;355:105-10. doi: 10.1016/j.jtbi.2014.04.008. Epub 2014 Apr 13.
Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.
了解蛋白质结构类别对于表征给定蛋白质的整体折叠类型起着重要作用。目前,在当前计算生物学中,仅使用蛋白质序列来预测低相似性序列的蛋白质结构类别,仅从蛋白质序列中提取序列信息仍然是一项挑战。在本研究中,提出了一种基于位置特异性评分矩阵的新型序列表示方法用于蛋白质结构类别预测。通过定义进化差异公式,将不同长度的蛋白质表示为统一维度的向量,该向量可以表示给定蛋白质相邻残基之间的进化差异信息。为了执行和评估所提出的方法,在三个广泛使用的数据集(25PDB、1189和640数据集,序列相似性分别低于25%、40%和25%)上采用了支持向量机和留一法测试。将我们的结果与先前方法进行比较表明,我们的方法可能为预测蛋白质结构类别提供一种有前景的方法,特别是对于低相似性序列。