使用Rosetta对同源蛋白中的结构可变区域进行建模。

Modeling structurally variable regions in homologous proteins with rosetta.

作者信息

Rohl Carol A, Strauss Charlie E M, Chivian Dylan, Baker David

机构信息

Department of Biomolecular Engineering, University of California, Santa Cruz 95064, USA.

出版信息

Proteins. 2004 May 15;55(3):656-77. doi: 10.1002/prot.10629.

DOI:10.1002/prot.10629

PMID:15103629

Abstract

A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three- and nine-residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side-chains to iteratively optimize backbone torsion angles and side-chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 A are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3A root-mean-square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39-residue insertion into a TIM barrel in CASP 5 target T0186.

摘要

当前比较建模方法的一个主要局限在于，对于那些在结构上与已知结构同源物存在差异的区域，其建模的准确性如何。由于同源蛋白质之间的结构差异决定了蛋白质功能和特异性的变化，对这些差异进行建模的能力具有重要的功能意义。尽管现有方法能够为短环区域提供较为准确的模型，但对较长的结构差异区域进行建模仍是一个未解决的问题。在此，我们描述一种基于从头结构预测算法Rosetta的方法，用于预测比较模型中结构差异区域的构象。短片段的初始构象从蛋白质结构数据库中选取，而较长片段则通过使用从数据库中提取的三残基和九残基片段构建，并利用Rosetta算法进行组合。势能中的间隙闭合项与用于梯度下降最小化的改进牛顿法相结合，以确保肽主链的连续性。可变区域的构象在固定模板结构的背景下进行优化，采用蒙特卡罗最小化方法并结合侧链的快速重排，以迭代优化主链扭转角和侧链旋转异构体。对于短环，4、8和12残基环的平均准确度分别为0.69、1.45和3.62埃。此外，该方法能够为较长的蛋白质片段提供合理的构象模型：在10个长度从13到34残基的片段示例中，有5个获得了均方根偏差为3埃或更小的预测构象。结合序列比对算法，该方法能够生成完整的、无间隙的蛋白质结构模型，包括与同源结构相似和不同的区域。这种组合方法被用于对蛋白质结构预测关键评估4（CASP 4）中的28个蛋白质结构域和CASP 5中的59个结构域进行预测，在比较建模和折叠识别方法中，该方法排名靠前。这些盲测预测中的模型准确性主要由比对质量决定，但在准确比对的情况下，长蛋白质片段能够被准确建模。值得注意的是，该方法正确预测了CASP 5目标T0186中插入TIM桶的一个39残基片段的局部结构。