TC Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA.
Protein Sci. 2012 Aug;21(8):1231-40. doi: 10.1002/pro.2106. Epub 2012 Jul 6.
How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three-dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance-based measure, such as the root-mean-square distance between target and candidate. This is an expensive approach because three-dimensional space is complex. Here, we transform the problem into a simpler one-dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high-resolution protein structures. Using this 11-letter alphabet, any protein's three-dimensional structure can be transformed into a one-dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence-based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen-bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close-packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein-folding search problem to mapping the amino acid sequence onto a restricted basin string.
折叠蛋白如何在广阔的、无特征的构象景观中进行协商,并在生物实时中采用其天然结构?受此搜索问题的启发,我们开发了一种新的算法来比较蛋白质结构。识别结构类似物的程序通常在三维空间中进行:将目标蛋白质的三级结构与数据库中的每个候选结构进行匹配,并通过基于距离的度量(例如目标和候选之间的均方根距离)来评估拟合度。这是一种昂贵的方法,因为三维空间很复杂。在这里,我们将问题转化为更简单的一维过程。具体来说,我们在高分辨率蛋白质结构数据库中识别并标记 11 个最常见的残基盆地。使用这个 11 字母的字母表,任何蛋白质的三维结构都可以通过将每个残基映射到其对应的盆地来转换为一维字符串。然后可以通过常规的基于序列的比较来评估所得盆地字符串之间的相似性。无序到有序的折叠转变在两侧都被缩短。在开始时,折叠条件需要形成氢键支架元素,蛋白质就是在这些支架元素上组装的,这严重限制了可访问构象空间的大小。在接近末端时,在出现紧密堆积的天然状态之前,链拓扑结构就已经建立。在折叠的这个后期阶段,链仍然处于熔融状态,残基填充由这里得出的 11 个盆地近似的天然盆地。从本质上讲,我们的算法将蛋白质折叠搜索问题简化为将氨基酸序列映射到受限的盆地字符串上。