Wang Kai, Fain Boris, Levitt Michael, Samudrala Ram
Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA.
BMC Struct Biol. 2004 Jun 18;4:8. doi: 10.1186/1472-6807-4-8.
A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations.
We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement.
Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
蛋白质结构预测中的一个关键组成部分是评分或判别函数,它能够区分接近天然的构象与错误折叠的构象。为实现这一目标,已开发出各种类型的评分函数,但它们的性能不足以解决结构选择问题。此外,评分与生成构象的准确性之间的相关性较差。
我们提出了一个简单的非参数公式来估计预测构象(或诱饵构象)的准确性。这个评分函数称为密度评分函数,通过在给定的诱饵构象集中进行所有对所有的Cα均方根偏差(Root Mean Square Deviation,RMSD)计算来评估诱饵构象。我们在按生成方法分组的83个诱饵构象集(4state_reduced、fisa、fisa_casp3、lmds、lattice_ssfit、semfold和Rosetta)上测试了密度评分函数。相对于每个诱饵构象的实验构象,密度评分与诱饵构象的Cα RMSD的相关性高达0.9。我们之前开发了一种残基特异性全原子概率判别函数(RAPDF),它从实验确定的构象数据库中收集统计数据,以辅助结构选择。在此,我们提出了一种依赖于诱饵构象的判别函数,称为自RAPDF,其中我们从一个诱饵构象集中的所有构象编译原子-原子接触概率,而不是使用天然构象的集合,并采用基于密度评分的加权方案。对于83个诱饵构象集中的76个,自RAPDF与Cα RMSD的相关性高于RAPDF,并且对于83个诱饵构象集中的62个,它能选择更好的接近天然的构象。自RAPDF不仅可用于从诱饵构象集中选择接近天然的构象,还可用于折叠模拟和蛋白质结构优化。
密度评分函数和自RAPDF函数都是依赖于诱饵构象的评分函数,用于改进蛋白质结构选择。它们的成功表明,来自诱饵构象集合的信息可用于推导统计概率并有助于识别接近天然的结构。