Kim David E, Blum Ben, Bradley Philip, Baker David
Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
J Mol Biol. 2009 Oct 16;393(1):249-60. doi: 10.1016/j.jmb.2009.07.063. Epub 2009 Jul 28.
The primary obstacle to de novo protein structure prediction is conformational sampling: the native state generally has lower free energy than nonnative structures but is exceedingly difficult to locate. Structure predictions with atomic level accuracy have been made for small proteins using the Rosetta structure prediction method, but for larger and more complex proteins, the native state is virtually never sampled, and it has been unclear how much of an increase in computing power would be required to successfully predict the structures of such proteins. In this paper, we develop an approach to determining how much computer power is required to accurately predict the structure of a protein, based on a reformulation of the conformational search problem as a combinatorial sampling problem in a discrete feature space. We find that conformational sampling for many proteins is limited by critical "linchpin" features, often the backbone torsion angles of individual residues, which are sampled very rarely in unbiased trajectories and, when constrained, dramatically increase the sampling of the native state. These critical features frequently occur in less regular and likely strained regions of proteins that contribute to protein function. In a number of proteins, the linchpin features are in regions found experimentally to form late in folding, suggesting a correspondence between folding in silico and in reality.
天然状态通常比非天然结构具有更低的自由能,但极难找到。使用Rosetta结构预测方法已对小蛋白质进行了具有原子水平准确性的结构预测,但对于更大、更复杂的蛋白质,几乎从未对天然状态进行采样,而且尚不清楚成功预测此类蛋白质的结构需要计算能力提高多少。在本文中,我们基于将构象搜索问题重新表述为离散特征空间中的组合采样问题,开发了一种确定准确预测蛋白质结构所需计算机能力的方法。我们发现,许多蛋白质的构象采样受到关键“关键”特征的限制,这些特征通常是单个残基的主链扭转角,在无偏轨迹中很少被采样,而当受到约束时,会显著增加天然状态的采样。这些关键特征经常出现在对蛋白质功能有贡献的不太规则且可能有应变的区域。在许多蛋白质中,关键特征位于实验发现折叠后期形成的区域,这表明计算机模拟折叠与实际折叠之间存在对应关系。