Li Yaohang, Rata Ionel, Chiu See-wing, Jakobsson Eric
Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
BMC Struct Biol. 2010 Jul 20;10:22. doi: 10.1186/1472-6807-10-22.
Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.
We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.
By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.
准确的蛋白质环结构模型对于理解许多蛋白质的功能至关重要。通过将天然或接近天然的模型与错误折叠的模型区分开来识别它们,是蛋白质环结构预测中的关键步骤。
我们开发了一种帕累托最优共识(POC)方法,这是一种共识模型排序方法,用于整合多个基于知识或物理的评分函数。在一组模型中识别最佳质量模型的过程包括:1)相对于一组评分函数识别帕累托最优前沿的模型,以及2)基于与其他模型的模糊优势关系对它们进行排序。我们使用由几个精心挑选的评分函数组成的功能空间,将POC方法应用于大量长度为4至12个残基的环的诱饵集:Rosetta、DOPE、DDFIRE、OPLS-AA以及我们实验室开发的三联体主链二面角势。我们的计算结果表明,帕累托最优诱饵集通常占一组中总诱饵的约20%或更少,在超过99%的环目标中对最佳或接近最佳诱饵具有良好的覆盖。与在诱饵集中产生最佳选择准确性的单个评分函数相比,POC方法在区分天然构象、将接近天然模型(与天然模型的RMSD < 0.5Å)列为排名第一以及在排名前5的模型中选择至少一个接近天然模型时,误报分别减少了23%、37%和64%。在膜蛋白环的诱饵集中也发现了POC方法的类似有效性。此外,POC方法在模型排序方面优于其他常用的共识策略,如按数量排名、逐排名、投票排名和基于回归的方法。
通过基于帕累托最优和模糊优势整合多个基于知识和物理的评分函数,POC方法有效地在环模型集中将最佳环模型与其他模型区分开来。