Departments of Biochemistry and Physics, Stanford University, Stanford, California, United States of America.
PLoS One. 2013 Oct 21;8(10):e74830. doi: 10.1371/journal.pone.0074830. eCollection 2013.
Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC) Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as an unusually systematic approach to ab initio protein structure modeling that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic accuracy.
仅从序列预测生物聚合物的原子分辨率结构仍然是一个难题,即使对于大型蛋白质的小亚段也是如此。在比较建模和蛋白质设计中经常出现的这种环预测挑战,如果环的长度超过 10 个残基并且周围的侧链构象被擦除,则会变得难以处理。当前的方法,如蛋白质局部优化协议或运动学反转封闭 (KIC) 蒙特卡罗,涉及简化建模但排除所有原子构型系统搜索的蛋白质粗粒化阶段。本文介绍了一种替代建模策略,该策略基于最近为 RNA 建模开发的“逐步假设”,该假设认为任何现实的全原子分子构象都可以通过残基逐步枚举来构建。当与 Rosetta 框架中的类似于动态编程的递归相结合时,所得的逐步组装 (SWA) 协议可以以数千个 CPU 小时的可观但可实现的成本对 12 个残基环进行枚举采样。在之前建立的基准测试中,与 KIC 建模相比,SWA 在 20 个环中恢复了 19 个具有亚埃精度的晶体学构象,而 KIC 建模的 14 个具有可比的计算能力支出。此外,SWA 在生物文献中突出显示的另外 15 个环的额外数据集上给出了高精度结果,这些环因其不规则性或不寻常的长度而引人注目。成功案例包括顺式-Pro 触摸环、穿过其他侧链隧道的环以及长度达 24 个残基的环。剩余的问题案例可追溯到 Rosetta 全原子能量函数的不准确性。在另外五个盲测中,SWA 实现了亚埃精度的模型,包括在蛋白质/RNA 结合界面中的首次成功,以及在第四个“RNA 谜题”竞赛中的 YbxF/kink-turn 相互作用。这些结果确立了全原子枚举作为一种异常系统的从头蛋白质结构建模方法,该方法可以利用高性能计算和物理现实的能量函数更一致地实现原子精度。