Hu Xiu Zhen, Li Qian Zhong
Laboratory of Theoretical Biophysics, Department of Physics, College of Sciences and Technology, Inner Mongolia University, Hohhot, 010021, P.R. China.
Protein J. 2008 Feb;27(2):115-22. doi: 10.1007/s10930-007-9114-z.
By using of the composite vector with increment of diversity and scoring function to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta-hairpin motifs is proposed. The prediction is done on a dataset of 3,088 non homologous proteins containing 6,027 beta-hairpins. The overall accuracy of prediction and Matthew's correlation coefficient are 79.9% and 0.59 for the independent testing dataset. In addition, a higher accuracy of 83.3% and Matthew's correlation coefficient of 0.67 in the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nuclic Acid Res 33:154-159). The performance of the method is also evaluated by predicting the beta-hairpins of in the CASP6 proteins, and the better results are obtained. Moreover, this method is used to predict four kinds of supersecondary structures. The overall accuracy of prediction is 64.5% for the independent testing dataset.
通过使用具有多样性增量和评分函数的复合向量来表达序列信息,提出了一种用于预测β-发夹基序的支持向量机(SVM)算法。预测是在一个包含6027个β-发夹的3088个非同源蛋白质数据集上进行的。对于独立测试数据集,预测的总体准确率和马修斯相关系数分别为79.9%和0.59。此外,在Kumar等人(《核酸研究》33:154 - 159)之前使用的数据集上,独立测试数据集获得了更高的准确率83.3%和马修斯相关系数0.67。该方法的性能也通过预测CASP6蛋白质中的β-发夹进行了评估,并获得了更好的结果。此外,该方法用于预测四种超二级结构。对于独立测试数据集,预测的总体准确率为64.5%。