Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.
Proc Natl Acad Sci U S A. 2010 Nov 16;107(46):19867-72. doi: 10.1073/pnas.1006428107. Epub 2010 Nov 1.
Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat many-body correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.
基于知识的方法通常利用经验关系,直接从蛋白质数据库结构确定粗粒蛋白质模型的有效势能。尽管这些方法在计算蛋白质科学中取得了相当大的成功和广泛的普及,但它们的基本基础一直受到广泛质疑。众所周知,传统的基于知识的方法不能正确处理氨基酸之间的多体相关性。此外,使用来自不同蛋白质的结构统计数据确定的势能的物理意义仍然不清楚。在本工作中,我们通过引入并演示一种从蛋白质结构数据库直接计算可转移势能的理论来解决这两个问题。该方法假设数据库结构对应于不同蛋白质平衡溶液系综中代表性的配置。基于这个假设,这个基于物理的理论精确地处理了多体结构相关性,并直接确定了可转移势能,为每个蛋白质的自由能景观提供了变分优化的近似。我们通过首先使用模型势能构建蛋白质结构数据库,并从结构数据库定量恢复该势能来说明这种方法。所提出的框架将阐明基于知识的势能的假设和物理意义,允许对其进行系统改进,并为折叠蛋白质中的多体相关性和协同作用提供新的见解。