Department of Molecular and Integrative Physiology, UIUC Program in Biophysics, National Center for Supercomputing Applications, and Beckman Institute, University of Illinois, Urbana, Illinois 61801, USA.
J Phys Chem B. 2010 Feb 11;114(5):1859-69. doi: 10.1021/jp909874g.
Native proteins have been optimized by evolution simultaneously for structure and sequence. Structural databases reflect this interdependency. In this paper, we present a new statistical potential for a reduced backbone representation that has both structure and sequence characteristics as variables. We use information from structural data available in the Protein Coil Library, selected on the basis of resolution and refinement factor. In these structures, the nonlocal interactions are randomly distributed and, thus, average out in statistics, so structural propensities due to local backbone-based interactions can be studied separately. We collect data in the form of local sequence-specific phi-psi backbone dihedral pairs. From these data, we construct dihedral probability density functions (DPDFs) that quantify any adjacent phi-psi pair distribution in the context of all possible combinations of local residue types. We use a probabilistic analysis to deduce how the correlations encoded in the various DPDFs as well as in residue frequencies propagate along the sequence and can be cumulated in a statistical potential capable of efficiently scoring a loop by its backbone conformation and sequence only. Our potential is able to identify with high accuracy the native structure of a loop with a given sequence among possible alternative conformations from sets of well-constructed decoys. Conversely, the potential can also be used for sequence prediction problems and is shown to score the native sequence of a given loop structure among the most fit of the possible sequence combinations. Applications for both structure prediction and sequence design are discussed.
天然蛋白质在结构和序列上同时经过了进化的优化。结构数据库反映了这种相互依存关系。在本文中,我们提出了一种新的统计势,用于简化的骨架表示,该表示具有结构和序列特征作为变量。我们使用了基于分辨率和细化因子选择的蛋白质卷曲库中可用的结构数据的信息。在这些结构中,非局部相互作用是随机分布的,因此在统计学上平均化,因此可以单独研究由于局部骨架相互作用引起的结构倾向。我们以局部序列特异性 phi-psi 骨架二面角对的形式收集数据。从这些数据中,我们构建了二面角概率密度函数(DPDF),该函数量化了在所有可能的局部残基类型组合下,任何相邻 phi-psi 对的分布。我们使用概率分析来推断各种 DPDF 以及残基频率中编码的相关性如何沿序列传播,并可以累积到一个统计势中,该势仅通过其骨架构象和序列就能有效地对环进行评分。我们的势能够以高精度识别给定序列的环的天然结构,而在来自精心构建的诱饵集的可能替代构象中,该序列是唯一的。反过来,该势也可用于序列预测问题,并在可能的序列组合中得分给定环结构的天然序列。讨论了结构预测和序列设计的应用。