Zhang Yang, Kihara Daisuke, Skolnick Jeffrey
Laboratory of Computational Genomics, Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA.
Proteins. 2002 Aug 1;48(2):192-201. doi: 10.1002/prot.10141.
Among the major difficulties in protein structure prediction is the roughness of the energy landscape that must be searched for the global energy minimum. To address this issue, we have developed a novel Monte Carlo algorithm called parallel hyperbolic sampling (PHS) that logarithmically flattens local high-energy barriers and, therefore, allows the simulation to tunnel more efficiently through energetically inaccessible regions to low-energy valleys. Here, we show the utility of this approach by applying it to the SICHO (SIde-CHain-Only) protein model. For the same CPU time, the parallel hyperbolic sampling method can identify much lower energy states and explore a larger region phase space than the commonly used replica sampling (RS) Monte Carlo method. By clustering the simulated structures obtained in the PHS implementation of the SICHO model, we can successfully predict, among a representative benchmark 65 proteins set, 50 cases in which one of the top 5 clusters have a root-mean-square deviation (RMSD) from the native structure below 6.5 A. Compared with our previous calculations that used RS as the conformational search procedure, the number of successful predictions increased by four and the CPU cost is reduced. By comparing the structure clusters produced by both PHS and RS, we find a strong correlation between the quality of predicted structures and the minimum relative RMSD (mrRMSD) of structures clusters identified by using different search engines. This mrRMSD correlation may be useful in blind prediction as an indicator of the likelihood of successful folds.
蛋白质结构预测中的主要困难之一是能量景观的粗糙度,必须在其中搜索全局能量最小值。为了解决这个问题,我们开发了一种名为并行双曲采样(PHS)的新型蒙特卡罗算法,该算法对数化地平滑了局部高能障碍,因此允许模拟更有效地穿越能量上无法到达的区域,进入低能量谷。在这里,我们通过将其应用于SICHO(仅侧链)蛋白质模型来展示这种方法的实用性。在相同的CPU时间内,并行双曲采样方法能够识别出能量低得多的状态,并探索比常用的复制采样(RS)蒙特卡罗方法更大的区域相空间。通过对在SICHO模型的PHS实现中获得的模拟结构进行聚类,我们能够在一个具有代表性的65种蛋白质基准集中成功预测出50个案例,其中前5个聚类之一与天然结构的均方根偏差(RMSD)低于6.5埃。与我们之前使用RS作为构象搜索程序的计算相比,成功预测的数量增加了4个,并且CPU成本降低了。通过比较PHS和RS产生的结构聚类,我们发现预测结构的质量与使用不同搜索引擎识别的结构聚类的最小相对均方根偏差(mrRMSD)之间存在很强的相关性。这种mrRMSD相关性在盲预测中可能作为成功折叠可能性的指标而有用。