Li Z-C, Zhou X-B, Lin Y-R, Zou X-Y
School of Chemistry and Chemical Engineering, Sun Yat-Sen University, 510275, Guangzhou, People's Republic of China.
Amino Acids. 2008 Oct;35(3):581-90. doi: 10.1007/s00726-008-0084-z. Epub 2008 Apr 22.
Structural class characterizes the overall folding type of a protein or its domain. Most of the existing methods for determining the structural class of a protein are based on a group of features that only possesses a kind of discriminative information for the prediction of protein structure class. However, different types of discriminative information associated with primary sequence have been completely missed, which undoubtedly has reduced the success rate of prediction. We present a novel method for the prediction of protein structure class by coupling the improved genetic algorithm (GA) with the support vector machine (SVM). This improved GA was applied to the selection of an optimized feature subset and the optimization of SVM parameters. Jackknife tests on the working datasets indicated that the prediction accuracies for the different classes were in the range of 97.8-100% with an overall accuracy of 99.5%. The results indicate that the approach has a high potential to become a useful tool in bioinformatics.
结构类别表征蛋白质或其结构域的整体折叠类型。大多数现有的确定蛋白质结构类别的方法基于一组特征,这些特征对于预测蛋白质结构类别仅具有一种判别信息。然而,与一级序列相关的不同类型的判别信息完全被遗漏了,这无疑降低了预测的成功率。我们提出了一种通过将改进的遗传算法(GA)与支持向量机(SVM)相结合来预测蛋白质结构类别的新方法。这种改进的GA被应用于选择优化的特征子集和SVM参数的优化。对工作数据集进行的留一法测试表明,不同类别的预测准确率在97.8 - 100%范围内,总体准确率为99.5%。结果表明,该方法有很大潜力成为生物信息学中的一种有用工具。