Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave,, Bronx, NY 10461, USA.
BMC Bioinformatics. 2010 Mar 12;11:128. doi: 10.1186/1471-2105-11-128.
Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment.
The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials.
Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
评分函数,如分子力学力场和统计势,是蛋白质结构建模和质量评估中非常重要的基本工具。
本文以统计学的严谨性比较了许多公开可用的评分函数的性能,重点介绍了基于知识的势能。我们探讨了替代表示相互作用中心类型和评分函数其他特征(如利用溶剂可及性、扭转角信息,考虑二级结构偏好和侧链取向)的选择对准确性的影响。部分基于观察结果,我们提出了一种新的基于残基的统计势能,该势能采用了打乱的参考状态定义,并考虑了残基侧链的相互取向。本文提出的基于原子和残基的统计势能以及用于计算给定蛋白质能量的 Linux 可执行文件可以从 http://www.fiserlab.org/potentials 下载。
在我们观察到的最有影响的术语中,适当的参考状态定义和包含相互作用中心微环境信息的重要性至关重要。还测试了分子力学势能,发现它们对结构中的小局部缺陷非常敏感,需要进行不切实际的长时间能量弛豫,才能使能量得分开始与模型质量相关。