Chen Jianhan, Brooks Charles L
Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA.
Proteins. 2007 Jun 1;67(4):922-30. doi: 10.1002/prot.21345.
Recent advances in efficient and accurate treatment of solvent with the generalized Born approximation (GB) have made it possible to substantially refine the protein structures generated by various prediction tools through detailed molecular dynamics simulations. As demonstrated in a recent CASPR experiment, improvement can be quite reliably achieved when the initial models are sufficiently close to the native basin (e.g., 3-4 A C(alpha) RMSD). A key element to effective refinement is to incorporate reliable structural information into the simulation protocol. Without intimate knowledge of the target and prediction protocol used to generate the initial structural models, it can be assumed that the regular secondary structure elements (helices and strands) and overall fold topology are largely correct to start with, such that the protocol limits itself to the scope of refinement and focuses the sampling in vicinity of the initial structure. The secondary structures can be enforced by dihedral restraints and the topology through structural contacts, implemented as either multiple pair-wise C(alpha) distance restraints or a single sidechain distance matrix restraint. The restraints are weakly imposed with flat-bottom potentials to allow sufficient flexibility for structural rearrangement. Refinement is further facilitated by enhanced sampling of advanced techniques such as the replica exchange method (REX). In general, for single domain proteins of small to medium sizes, 3-5 nanoseconds of REX/GB refinement simulations appear to be sufficient for reasonable convergence. Clustering of the resulting structural ensembles can yield refined models over 1.0 A closer to the native structure in C(alpha) RMSD. Substantial improvement of sidechain contacts and rotamer states can also be achieved in most cases. Additional improvement is possible with longer sampling and knowledge of the robust structural features in the initial models for a given prediction protocol. Nevertheless, limitations still exist in sampling as well as force field accuracy, manifested as difficulty in refinement of long and flexible loops.
利用广义玻恩近似(GB)对溶剂进行高效准确处理的最新进展,使得通过详细的分子动力学模拟大幅优化各种预测工具生成的蛋白质结构成为可能。如最近的CASPR实验所示,当初始模型足够接近天然构象盆地(例如,Cα均方根偏差为3 - 4 Å)时,能够相当可靠地实现改进。有效优化的一个关键要素是将可靠的结构信息纳入模拟协议。在不深入了解用于生成初始结构模型的目标和预测协议的情况下,可以假定常规二级结构元件(螺旋和链)以及整体折叠拓扑在一开始大体上是正确的,这样协议就将自身限制在优化范围内,并将采样集中在初始结构附近。二级结构可以通过二面角约束来强化,拓扑结构则通过结构接触来实现,可采用多对Cα距离约束或单个侧链距离矩阵约束。这些约束通过平底势弱施加,以允许结构重排有足够的灵活性。先进技术如副本交换方法(REX)的增强采样进一步促进了优化。一般来说,对于中小尺寸的单结构域蛋白质,3 - 5纳秒的REX/GB优化模拟似乎足以实现合理的收敛。对所得结构系综进行聚类可以得到Cα均方根偏差比天然结构小超过1.0 Å的优化模型。在大多数情况下,侧链接触和旋转异构体状态也能得到显著改善。通过更长时间的采样以及了解给定预测协议初始模型中的稳健结构特征,还可以实现进一步的改进。然而,在采样以及力场精度方面仍然存在局限性,表现为难以优化长而灵活的环。