Department of Biochemistry, Duke University, Durham, NC 27710, USA.
Department of Biochemistry, Duke University, Durham, NC 27710, USA.
J Struct Biol. 2018 Nov;204(2):301-312. doi: 10.1016/j.jsb.2018.08.007. Epub 2018 Aug 11.
We find that the overall quite good methods used in the CryoEM Model Challenge could still benefit greatly from several strategies for improving local conformations. Our assessments primarily use validation criteria from the MolProbity web service. Those criteria include MolProbity's all-atom contact analysis, updated versions of standard conformational validations for protein and RNA, plus two recent additions: first, flags for cis-nonPro and twisted peptides, and second, the CaBLAM system for diagnosing secondary structure, validating Cα backbone, and validating adjacent peptide CO orientations in the context of the Cα trace. In general, automated ab initio building of starting models is quite good at backbone connectivity but often fails at local conformation or sequence register, especially at poorer than 3.5 Å resolution. However, we show that even if criteria (such as Ramachandran or rotamer) are explicitly restrained to improve refinement behavior and overall validation scores, automated optimization of a deposited structure seldom corrects specific misfittings that start in the wrong local minimum, but just hides them. Therefore, local problems should be identified, and as many as possible corrected, before starting refinement. Secondary structures are confusing at 3-4 Å but can be better recognized at 6-8 Å. In future model challenges, specific steps being tested (such as segmentation) and the required documentation (such as PDB code of starting model) should each be explicitly defined, so competing methods on a given task can be meaningfully compared. Individual local examples are presented here, to understand what local mistakes and corrections look like in 3D, how they probably arise, and what possible improvements to methodology might help avoid them. At these resolutions, both structural biologists and end-users need meaningful estimates of local uncertainty, perhaps through explicit ensembles. Fitting problems can best be diagnosed by validation that spans multiple residues; CaBLAM is such a multi-residue tool, and its effectiveness is demonstrated.
我们发现,CryoEM 模型挑战赛中整体相当不错的方法仍然可以从几种改进局部构象的策略中受益良多。我们的评估主要使用 MolProbity 网络服务的验证标准。这些标准包括 MolProbity 的全原子接触分析、蛋白质和 RNA 的标准构象验证的更新版本,以及最近添加的两个标准:首先,顺式非 Pro 和扭曲肽的标志,其次,CaBLAM 系统用于诊断二级结构、验证 Cα 骨架以及在 Cα 轨迹的上下文中验证相邻肽 CO 方向。一般来说,自动从头构建起始模型在骨架连接方面非常出色,但在局部构象或序列注册方面经常失败,尤其是在分辨率低于 3.5Å 的情况下。然而,我们表明,即使标准(如 Ramachandran 或构象)被明确限制以改善精修行为和整体验证分数,存储结构的自动优化很少能纠正从错误的局部最小值开始的特定不匹配,而只是将其隐藏起来。因此,在开始精修之前,应识别并尽可能多地纠正局部问题。二级结构在 3-4Å 时比较混乱,但在 6-8Å 时可以更好地识别。在未来的模型挑战赛中,应明确定义正在测试的具体步骤(如分割)和所需的文档(如起始模型的 PDB 代码),以便可以对给定任务的竞争方法进行有意义的比较。这里呈现了一些具体的局部例子,以了解 3D 中的局部错误和纠正看起来是什么样子,它们可能是如何产生的,以及可能有助于避免它们的方法改进。在这些分辨率下,结构生物学家和最终用户都需要对局部不确定性进行有意义的估计,也许可以通过显式集合来实现。通过跨越多个残基的验证可以最好地诊断拟合问题;CaBLAM 就是这样一种多残基工具,其有效性得到了证明。