Martin A C, MacArthur M W, Thornton J M
Department of Biochemistry and Molecular Biology, University College London, United Kingdom.
Proteins. 1997;Suppl 1:14-28. doi: 10.1002/(sici)1097-0134(1997)1+<14::aid-prot4>3.3.co;2-f.
An assessment is presented for all submissions to the comparative modeling challenge in the 1996 Critical Assessment of Structure Prediction (CASP2). Of the original 12 target structures, 9 were solved prior to the meeting: 8 by X-ray crystallography and 1 by NMR spectroscopy. These targets varied over a large range of difficulty, as assessed by the percentage sequence identity with the principal parent structure, which ranged from 20% up to 85%. The overall quality of the models reflected the similarity of the principal parent. As expected, when the sequence alignment was correct, the core was accurately modeled, with the largest deviations occurring in the loops. Models were built which gave C alpha root-mean-square deviations (RMSDs) compared with the observed structure of < 1 A for targets with high parental similarity; even at 26% sequence identity, the best model structures had C alpha deviations of only 2.2 A. Overall, these deviations are comparable with those observed between the parent structure and the target, but locally there are several examples where the model approaches closer to the target than does the parent. There were three targets below 25% sequence identity, and the models generated for these targets were, in general, significantly less accurate. This principally reflects errors in the alignment which, if systematically shifted, can generate C alpha RMSDs > 18 A. Compared with CASP1, the geometry of the models was significantly improved with no D-amino acids. By far the major contribution to RMSD error was the alignment accuracy, which varied from 100% down to 7% over the range of targets. In the structurally variable regions, global shifts, caused by hinge bending, were the major source of error, giving significantly lower local RMSDs than global RMSDs. In over 50% of these noncore regions, the difference between global and local RMSDs was more than 3 A, and was as high as 10 A for one structurally variable region. For the side chains, the chi 1 RMSDs are strongly correlated with the C alpha RMSDs. For models with C alpha deviations less than 1 A, on average 78.5% of side chains are placed in the correct rotamer, although the chi 1 RMSDs, though clearly better than random, were disappointing at around 46 degrees. As the backbone deviations increased, the side chain placement became less accurate, with an average chi 1 RMSD of 75 degrees on a 1.5-2.5 A C alpha backbone (average 51.4% correct rotamer). Refinement by energy minimization or molecular dynamics made only minor adjustments to improve local geometry and generally made small, but not significant, improvements to the RMSD. In total, 19 groups submitted 62 models (89 coordinate sets) that could be assessed. Most modelers used manual adjustments to sequence alignments and, in general, good alignments were obtained down to 25% sequence identity. The modeling methods ranged from "classical" modeling, involving core building followed by loop and side chain addition, to more sophisticated approaches based on probability distributions, Monte Carlo sampling or distance geometry. For each target, several groups produced equally good models, given the expected errors in the structures (about 0.5 A). No one method came out as clearly superior, although the approaches that inherit directly from the parents generally performed better than the more radical techniques. However, for each target there were some poor models, usually reflecting a poor sequence alignment, and the range of accuracy for each target is therefore large. Fully automated methods are able to perform very well for "easy" targets (85% sequence identity with parent), but when modeling using a distantly related parent, care and expertise, especially in performing the alignment, still appear to be important factors in generating accurate models.
本文对1996年蛋白质结构预测关键评估(CASP2)中比较建模挑战赛的所有提交结果进行了评估。在最初的12个目标结构中,9个在会议之前已得到解析:8个通过X射线晶体学解析,1个通过核磁共振光谱解析。这些目标在难度上差异很大,通过与主要亲本结构的序列同一性百分比来评估,范围从20%到85%。模型的整体质量反映了主要亲本的相似性。正如预期的那样,当序列比对正确时,核心部分能被精确建模,最大偏差出现在环区。对于与亲本高度相似的目标,构建的模型与观察到的结构相比,Cα原子的均方根偏差(RMSD)<1 Å;即使在序列同一性为26%时,最佳模型结构的Cα偏差也仅为2.2 Å。总体而言,这些偏差与亲本结构和目标结构之间观察到的偏差相当,但在局部有几个例子表明,模型比亲本更接近目标。有3个目标的序列同一性低于25%,为这些目标生成的模型通常准确性明显较低。这主要反映了比对中的错误,如果系统地偏移,可能会产生Cα RMSD>18 Å。与CASP1相比,模型的几何结构有了显著改善,没有D - 氨基酸。到目前为止,对RMSD误差的主要贡献是比对准确性,在目标范围内从100%下降到7%。在结构可变区域,由铰链弯曲引起的全局偏移是主要误差来源,导致局部RMSD明显低于全局RMSD。在超过50%的这些非核心区域,全局和局部RMSD的差异超过3 Å,对于一个结构可变区域高达10 Å。对于侧链,χ1 RMSD与Cα RMSD密切相关。对于Cα偏差小于1 Å的模型,平均78.5%的侧链被放置在正确的旋转异构体中,尽管χ1 RMSD虽然明显优于随机情况,但在约46度时仍令人失望。随着主链偏差增加,侧链放置变得不那么准确,在Cα主链偏差为1. .5 - 2.5 Å时,平均χ1 RMSD为75度(平均51.4%的旋转异构体正确)。通过能量最小化或分子动力学进行的优化仅对改善局部几何结构进行了微小调整,并且通常对RMSD有小的但不显著的改善。总共有19个小组提交了62个模型(89个坐标集)可供评估。大多数建模者对序列比对进行了手动调整,总体而言,在序列同一性低至25%时也能获得良好的比对。建模方法从“经典”建模(包括构建核心然后添加环和侧链)到基于概率分布、蒙特卡罗采样或距离几何的更复杂方法。对于每个目标,考虑到结构中的预期误差(约0.5 Å),几个小组生成了同样好的模型。没有一种方法明显优于其他方法,尽管直接从亲本继承的方法通常比更激进的技术表现更好。然而,对于每个目标都有一些较差的模型,通常反映出序列比对不佳,因此每个目标的准确性范围很大。全自动方法对于“简单”目标(与亲本序列同一性为85%)能够表现得非常好,但当使用远亲进行建模时,谨慎和专业知识,特别是在进行比对时,似乎仍然是生成准确模型的重要因素。