Chen Chao, Chen Li-Xuan, Zou Xiao-Yong, Cai Pei-Xiang
School of Traditional Chinese Medicine, Guangdong Pharmaceutical University, Guangzhou 510006, PR China.
J Theor Biol. 2008 Jul 21;253(2):388-92. doi: 10.1016/j.jtbi.2008.03.009. Epub 2008 Mar 14.
Structural class characterizes the overall folding type of a protein or its domain and the prediction of protein structural class has become both an important and a challenging topic in protein science. Moreover, the prediction itself can stimulate the development of novel predictors that may be straightforwardly applied to many other relational areas. In this paper, 10 frequently used sequence-derived structural and physicochemical features, which can be easily computed by the PROFEAT (Protein Features) web server, were taken as inputs of support vector machines to develop statistical learning models for predicting the protein structural class. More importantly, a strategy of merging different features, called best-first search, was developed. It was shown through the rigorous jackknife cross-validation test that the success rates by our method were significantly improved. We anticipate that the present method may also have important impacts on boosting the predictive accuracies for a series of other protein attributes, such as subcellular localization, membrane types, enzyme family and subfamily classes, among many others.
结构类别表征蛋白质或其结构域的整体折叠类型,蛋白质结构类别的预测已成为蛋白质科学中一个重要且具有挑战性的课题。此外,预测本身可以推动新型预测器的开发,这些预测器可直接应用于许多其他相关领域。在本文中,选取了10个常用的源自序列的结构和物理化学特征(可通过PROFEAT(蛋白质特征)网络服务器轻松计算得出)作为支持向量机的输入,以开发用于预测蛋白质结构类别的统计学习模型。更重要的是,开发了一种称为最佳优先搜索的合并不同特征的策略。通过严格的留一法交叉验证测试表明,我们方法的成功率得到了显著提高。我们预计,本方法可能还会对提高一系列其他蛋白质属性(如亚细胞定位、膜类型、酶家族和亚家族类别等)的预测准确性产生重要影响。