Klimov D K, Thirumalai D
Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA.
Proteins. 1996 Dec;26(4):411-41. doi: 10.1002/(SICI)1097-0134(199612)26:4<411::AID-PROT4>3.0.CO;2-E.
We use a three-dimensional lattice model of proteins to investigate systematically the global properties of the polypeptide chains that determine the folding to the native conformation starting from an ensemble of denatured conformations. In the coarse-grained description, the polypeptide chain is modeled as a heteropolymer consisting of N beads confined to the vertices of a simple cubic lattice. The interactions between the beads are taken from a random gaussian distribution of energies, with a mean value B0 < 0 that corresponds to the overall average hydrophobic interaction energy. We studied 56 sequences all with a unique ground state (native conformation) covering two values of N (15 and 27) and two values of B0. The smaller value of magnitude of B0 was chosen so that the average fraction of hydrophobic residues corresponds to that found in natural proteins. The higher value of magnitude of B0 was selected with the expectation that only the fully compact conformations would contribute to the thermodynamic behavior. For N = 15 the entire conformation space (compact as well as noncompact structures) can be exhaustively enumerated so that the thermodynamic properties can be exactly computed at all temperatures. The thermodynamic properties for the 27-mer chain were calculated using the slow cooling technique together with standard Monte Carlo simulations. The kinetics of approach to the native state for all the sequences was obtained using Monte Carlo simulations. For all sequences we find that there are two intrinsic characteristic temperatures, namely, T theta and Tf. At the temperature T theta the polypeptide chain makes a transition to a collapsed structure, while at Tf the chain undergoes a transition to the native conformation. We show that foldability of sequences can be characterized entirely in terms of these two temperatures. It is shown that fast folding sequences have small values of sigma = (T theta - Tf)/T theta whereas slow folders have larger values of sigma (the range of sigma is 0 < sigma < 1). The calculated values of the folding times correlate extremely well with sigma. An increase in sigma from 0.1 to 0.7 can result in an increase of 5-6 orders of magnitudes in folding times. In contrast, we demonstrate that there is no useful correlation between folding times and the energy gap between the native conformation and the first excited state at any N for any value of B0. In particular, in the parameter space of the model, many sequences with varying energy gaps, all with roughly the same folding time, can be easily engineered. Folding sequences in this model, therefore, can be classified based solely on the value of sigma. Fast folders have small values of sigma (typically less than about 0.1), moderate folders have values of sigma in the approximate range between 0.1 and 0.6, while for slow folders sigma exceeds 0.6. The precise boundary between these categories depends crucially on N and on the model. The range of sigma for fast folders decreases with the length of the chain. At temperatures close to Tf fast folders reach the native conformation via a native conformation nucleation collapse mechanism without forming any detectable intermediates, whereas only a fraction of molecule phi (T) reaches the native conformation by this process for moderate folders. The remaining fraction reaches the native state via three-stage multipathway process. For slow folders phi (T) is close to zero at all temperatures. The simultaneous requirement of native state stability and kinetic accessibility can be achieved at high enough temperatures for those sequences with small values of sigma. The utility of these results for de novo design of proteins is briefly discussed.
我们使用蛋白质的三维晶格模型,从一组变性构象开始,系统地研究决定折叠成本征构象的多肽链的全局性质。在粗粒度描述中,多肽链被建模为由N个珠子组成的杂聚物,这些珠子限制在简单立方晶格的顶点上。珠子之间的相互作用取自能量的随机高斯分布,其平均值B0 < 0,对应于整体平均疏水相互作用能。我们研究了56个序列,所有序列都具有唯一的基态(本征构象),涵盖了两个N值(15和27)以及两个B0值。选择较小的B0值幅度,以便疏水残基的平均比例与天然蛋白质中的比例相对应。选择较大的B0值幅度,期望只有完全紧凑的构象会对热力学行为有贡献。对于N = 15,可以详尽地列举整个构象空间(紧凑和非紧凑结构),从而可以在所有温度下精确计算热力学性质。对于27聚体链的热力学性质,使用慢冷却技术和标准蒙特卡罗模拟进行计算。使用蒙特卡罗模拟获得所有序列达到本征态的动力学过程。对于所有序列,我们发现存在两个内在特征温度,即Tθ和Tf。在温度Tθ时,多肽链转变为塌缩结构,而在Tf时,链转变为本征构象。我们表明,序列的可折叠性可以完全根据这两个温度来表征。结果表明,快速折叠序列的σ = (Tθ - Tf)/Tθ值较小,而慢速折叠序列的σ值较大(σ的范围是0 < σ < 1)。计算得到的折叠时间值与σ高度相关。σ从0.1增加到0.7可导致折叠时间增加5 - 6个数量级。相比之下,我们证明,对于任何N和任何B0值,折叠时间与本征构象和第一激发态之间的能隙没有有用的相关性。特别是,在模型的参数空间中,可以轻松设计出许多具有不同能隙但折叠时间大致相同的序列。因此,该模型中的折叠序列可以仅根据σ值进行分类。快速折叠序列的σ值较小(通常小于约0.1),中等折叠序列的σ值在约0.1至0.6的近似范围内,而对于慢速折叠序列,σ超过0.6。这些类别之间的精确边界关键取决于N和模型。快速折叠序列的σ范围随链长而减小。在接近Tf的温度下,快速折叠序列通过本征构象成核塌缩机制达到本征构象,不形成任何可检测的中间体,而对于中等折叠序列,只有一部分分子φ(T)通过此过程达到本征构象。其余部分通过三阶段多途径过程达到本征态。对于慢速折叠序列,在所有温度下φ(T)都接近零。对于那些σ值较小的序列,在足够高的温度下可以同时满足本征态稳定性和动力学可及性的要求。简要讨论了这些结果在蛋白质从头设计中的应用。