Holtby Daniel, Li Shuai Cheng, Li Ming
David R. Chariton School of Computer Science, University of Waterloo, Waterloo, Canada.
J Comput Biol. 2013 Mar;20(3):212-23. doi: 10.1089/cmb.2012.0078.
Modeling loops is a necessary step in protein structure determination, even with experimental nuclear magnetic resonance (NMR) data, it is widely known to be difficult. Database techniques have the advantage of producing a higher proportion of predictions with subangstrom accuracy when compared with ab initio techniques, but the disadvantage of also producing a higher proportion of clashing or highly inaccurate predictions. We introduce LoopWeaver, a database method that uses multidimensional scaling to achieve better, clash-free placement of loops obtained from a database of protein structures. This allows us to maintain the above-mentioned advantage while avoiding the disadvantage. Test results show that we achieve significantly better results than all other methods, including Modeler, Loopy, SuperLooper, and Rapper, before refinement. With refinement, our results (LoopWeaver and Loopy consensus) are better than ROSETTA, with 0.42 Å RMSD on average for 206 length 6 loops, 0.64 Å local RMSD for 168 length 7 loops, 0.81Å RMSD for 117 length 8 loops, and 0.98 Å RMSD for length 9 loops, while ROSETTA has 0.55, 0.79, 1.16, 1.42, respectively, at the same average time limit (3 hours). When we allow ROSETTA to run for over a week, it approaches, but does not surpass, our accuracy.
对环进行建模是蛋白质结构测定中的必要步骤,即便有实验核磁共振(NMR)数据,众所周知这一过程仍很困难。与从头计算技术相比,数据库技术的优势在于能产生更高比例具有亚埃级精度的预测结果,但劣势在于也会产生更高比例的冲突或高度不准确的预测结果。我们引入了LoopWeaver,这是一种数据库方法,它使用多维缩放来实现从蛋白质结构数据库中获取的环的更好的、无冲突放置。这使我们能够保持上述优势,同时避免劣势。测试结果表明,在优化之前,我们取得的结果明显优于所有其他方法,包括Modeler、Loopy、SuperLooper和Rapper。经过优化后,我们的结果(LoopWeaver和Loopy共识)优于ROSETTA,对于206个长度为6的环,平均RMSD为0.42 Å,对于168个长度为7的环,局部RMSD为0.64 Å,对于117个长度为8的环,RMSD为0.81 Å,对于长度为9的环,RMSD为0.98 Å,而在相同的平均时间限制(3小时)下,ROSETTA的相应结果分别为0.55、0.79、1.16、1.42。当我们允许ROSETTA运行超过一周时,它接近但未超过我们的精度。