使用帕累托最优共识方法改进预测的蛋白质环结构排名。

Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

作者信息

Li Yaohang, Rata Ionel, Chiu See-wing, Jakobsson Eric

机构信息

Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.

出版信息

BMC Struct Biol. 2010 Jul 20;10:22. doi: 10.1186/1472-6807-10-22.

DOI:10.1186/1472-6807-10-22

PMID:20642859

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2914074/

Abstract

BACKGROUND

Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

RESULTS

We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

CONCLUSIONS

By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

摘要

背景

准确的蛋白质环结构模型对于理解许多蛋白质的功能至关重要。通过将天然或接近天然的模型与错误折叠的模型区分开来识别它们，是蛋白质环结构预测中的关键步骤。

结果

我们开发了一种帕累托最优共识（POC）方法，这是一种共识模型排序方法，用于整合多个基于知识或物理的评分函数。在一组模型中识别最佳质量模型的过程包括：1）相对于一组评分函数识别帕累托最优前沿的模型，以及2）基于与其他模型的模糊优势关系对它们进行排序。我们使用由几个精心挑选的评分函数组成的功能空间，将POC方法应用于大量长度为4至12个残基的环的诱饵集：Rosetta、DOPE、DDFIRE、OPLS-AA以及我们实验室开发的三联体主链二面角势。我们的计算结果表明，帕累托最优诱饵集通常占一组中总诱饵的约20%或更少，在超过99%的环目标中对最佳或接近最佳诱饵具有良好的覆盖。与在诱饵集中产生最佳选择准确性的单个评分函数相比，POC方法在区分天然构象、将接近天然模型（与天然模型的RMSD < 0.5Å）列为排名第一以及在排名前5的模型中选择至少一个接近天然模型时，误报分别减少了23%、37%和64%。在膜蛋白环的诱饵集中也发现了POC方法的类似有效性。此外，POC方法在模型排序方面优于其他常用的共识策略，如按数量排名、逐排名、投票排名和基于回归的方法。