Berglund Anders, Head Richard D, Welsh Eric A, Marshall Garland R
Center for Computational Biology, Washington University Medical School, St. Louis, Missouri 63110, USA.
Proteins. 2004 Feb 1;54(2):289-302. doi: 10.1002/prot.10523.
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.
已开发出一种低分辨率评分函数,用于从给定蛋白质序列的一组预测结构中选择天然结构和近天然结构。该评分函数ProVal(蛋白质验证)使用了几个描述蛋白质结构方面的变量,通过这些变量可以定量评估与天然结构的接近程度。其中包括堆积估计、表面积和接触序等参数。使用上述参数作为自变量,为针对22种不同蛋白质生成的28个诱饵结构集的每个候选集建立了偏最小二乘潜变量(PLS)模型。将候选结构相对于实验结构的Cα均方根偏差用作因变量。最终的广义评分函数是所有导出模型的平均值,确保该函数不是针对特定折叠类或候选折叠的结构生成方法进行优化的。结果表明,在28个测试集中,有64%的测试集对晶体结构的评分最高,并且在许多实例中,晶体结构与诱饵结构明显区分开来。在所有其他晶体结构未排名第一的情况下,它也排在前10%以内。因此,尽管由于其固有的低分辨率,ProVal无法区分整体折叠质量相似的预测结构,但它显然可以用作主要过滤器,从全原子建模中消除当前预测方法生成的约90%的折叠候选物,以便进一步评估。候选折叠集之间预测的和实际的Cα均方根偏差值之间的相关性差异很大。