Qiu Jian, Sheffler Will, Baker David, Noble William Stafford
Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
Proteins. 2008 May 15;71(3):1175-82. doi: 10.1002/prot.21809.
Protein structure prediction is an important problem of both intellectual and practical interest. Most protein structure prediction approaches generate multiple candidate models first, and then use a scoring function to select the best model among these candidates. In this work, we develop a scoring function using support vector regression (SVR). Both consensus-based features and features from individual structures are extracted from a training data set containing native protein structures and predicted structural models submitted to CASP5 and CASP6. The SVR learns a scoring function that is a linear combination of these features. We test this scoring function on two data sets. First, when used to rank server models submitted to CASP7, the SVR score selects predictions that are comparable to the best performing server in CASP7, Zhang-Server, and significantly better than all the other servers. Even if the SVR score is not allowed to select Zhang-Server models, the SVR score still selects predictions that are significantly better than all the other servers. In addition, the SVR is able to select significantly better models and yield significantly better Pearson correlation coefficients than the two best Quality Assessment groups in CASP7, QA556 (LEE), and QA634 (Pcons). Second, this work aims to improve the ability of the Robetta server to select best models, and hence we evaluate the performance of the SVR score on ranking the Robetta server template-based models for the CASP7 targets. The SVR selects significantly better models than the Robetta K*Sync consensus alignment score.
蛋白质结构预测是一个兼具学术价值和实际意义的重要问题。大多数蛋白质结构预测方法首先生成多个候选模型,然后使用评分函数从这些候选模型中选择最佳模型。在这项工作中,我们使用支持向量回归(SVR)开发了一种评分函数。基于一致性的特征和来自单个结构的特征都从一个训练数据集中提取,该训练数据集包含天然蛋白质结构以及提交给CASP5和CASP6的预测结构模型。SVR学习一个作为这些特征线性组合的评分函数。我们在两个数据集上测试了这个评分函数。首先,当用于对提交给CASP7的服务器模型进行排名时,SVR分数选择的预测结果与CASP7中表现最佳的服务器Zhang-Server相当,并且明显优于所有其他服务器。即使不允许SVR分数选择Zhang-Server模型,SVR分数仍然选择明显优于所有其他服务器的预测结果。此外,与CASP7中两个最佳质量评估组QA556(LEE)和QA634(Pcons)相比,SVR能够选择明显更好的模型并产生明显更好的皮尔逊相关系数。其次,这项工作旨在提高Robetta服务器选择最佳模型的能力,因此我们评估了SVR分数在对CASP7目标的基于Robetta服务器模板的模型进行排名时的性能。SVR选择的模型明显优于Robetta K*Sync一致性比对分数。