Hintze Bradley J, Lewis Steven M, Richardson Jane S, Richardson David C
Department of Biochemistry, Duke University, Durham North Carolina 27710.
Proteins. 2016 Sep;84(9):1177-89. doi: 10.1002/prot.25039. Epub 2016 Jun 23.
Here we describe the updated MolProbity rotamer-library distributions derived from an order-of-magnitude larger and more stringently quality-filtered dataset of about 8000 (vs. 500) protein chains, and we explain the resulting changes and improvements to model validation as seen by users. To include only side-chains with satisfactory justification for their given conformation, we added residue-specific filters for electron-density value and model-to-density fit. The combined new protocol retains a million residues of data, while cleaning up false-positive noise in the multi- χ datapoint distributions. It enables unambiguous characterization of conformational clusters nearly 1000-fold less frequent than the most common ones. We describe examples of local interactions that favor these rare conformations, including the role of authentic covalent bond-angle deviations in enabling presumably strained side-chain conformations. Further, along with favored and outlier, an allowed category (0.3-2.0% occurrence in reference data) has been added, analogous to Ramachandran validation categories. The new rotamer distributions are used for current rotamer validation in MolProbity and PHENIX, and for rotamer choice in PHENIX model-building and refinement. The multi-dimensional χ distributions and Top8000 reference dataset are freely available on GitHub. These rotamers are termed "ultimate" because data sampling and quality are now fully adequate for this task, and also because we believe the future of conformational validation should integrate side-chain with backbone criteria. Proteins 2016; 84:1177-1189. © 2016 Wiley Periodicals, Inc.
在此,我们描述了更新后的MolProbity旋转异构体库分布,其源自一个数量级更大且经过更严格质量筛选的数据集,该数据集包含约8000条(相比之前的500条)蛋白质链,并且我们解释了用户在模型验证中看到的由此产生的变化和改进。为了仅纳入对其特定构象有充分合理依据的侧链,我们添加了基于电子密度值和模型与密度拟合的残基特异性筛选条件。新的综合方案保留了一百万个残基的数据,同时清理了多χ数据点分布中的假阳性噪声。它能够明确表征频率比最常见构象低近1000倍的构象簇。我们描述了有利于这些罕见构象的局部相互作用的例子,包括真实共价键角偏差在促成可能紧张的侧链构象方面的作用。此外,与受欢迎和异常值类别一起,增加了一个允许类别(在参考数据中出现频率为0.3 - 2.0%),类似于拉氏构象图验证类别。新的旋转异构体分布用于MolProbity和PHENIX中当前的旋转异构体验证,以及PHENIX模型构建和优化中的旋转异构体选择。多维χ分布和Top8000参考数据集可在GitHub上免费获取。这些旋转异构体被称为“终极”旋转异构体,这是因为现在数据采样和质量对于此任务已完全足够,还因为我们认为构象验证的未来应将侧链标准与主链标准相结合。《蛋白质》2016年;84:1177 - 1189。© 2016威利期刊公司