模型构建和稀疏数据区域精修趋势的好坏：过度拟合的有害形式与良好的新工具和预测。

The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions.

机构信息

Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, USA.

出版信息

Acta Crystallogr D Struct Biol. 2023 Dec 1;79(Pt 12):1071-1078. doi: 10.1107/S2059798323008847. Epub 2023 Nov 3.

DOI:10.1107/S2059798323008847

PMID:37921807

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10833350/

Abstract

Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5 Å for both crystallography and cryo-EM. However, at local resolutions worse than 2.5 Å both the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90-180°. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in high-confidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.

摘要

模型构建和精修，以及对其正确性的验证，在局部分辨率优于约 2.5Å 的情况下，对于晶体学和 cryo-EM 都非常有效和可靠。然而，在局部分辨率低于 2.5Å 的情况下，这些程序及其验证都会失效，无法确保得到可靠正确的模型。这是因为在较低分辨率的广泛密度中，关键特征，如蛋白质骨架羰基 O 原子，不仅准确性较低，而且根本无法看到，因此肽段取向经常会错误地旋转 90-180°。这会将骨架和侧链置于错误的局部能量最小值，然后通过进一步精修进入有效但不正确的构象或 Ramachandran 区域，反而会使它们变得更糟而不是更好。从积极的方面来看，正在开发新的工具来定位 PDB 存储库中这种类型的有害错误，例如 CaBLAM、EMRinger、核糖构象 puckers 的 Pperp 诊断以及 PDB-REDO 中的肽段翻转，而 Coot 或 ISOLDE 中的交互式建模可以帮助修复其中的许多错误。另一个积极的趋势是，人工智能预测，如 AlphaFold2 所做的预测，从大量多重序列比对中提供额外的证据，在高可信度部分，它们为具有其他不明确密度的环、末端或整个结构域提供了相当好的起始模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1055/10833350/7ef7c6fbf93f/d-79-01071-fig1.jpg

相似文献

The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions.模型构建和稀疏数据区域精修趋势的好坏：过度拟合的有害形式与良好的新工具和预测。

Acta Crystallogr D Struct Biol. 2023 Dec 1;79(Pt 12):1071-1078. doi: 10.1107/S2059798323008847. Epub 2023 Nov 3.

Assessment of detailed conformations suggests strategies for improving cryoEM models: Helix at lower resolution, ensembles, pre-refinement fixups, and validation at multi-residue length scale.评估详细构象可为改进 cryoEM 模型提供策略：低分辨率的螺旋、集合、预精修修复以及多残基长度尺度的验证。

J Struct Biol. 2018 Nov;204(2):301-312. doi: 10.1016/j.jsb.2018.08.007. Epub 2018 Aug 11.

Residue-level error detection in cryoelectron microscopy models.残余误差检测在低温电子显微镜模型中。

Structure. 2023 Jul 6;31(7):860-869.e4. doi: 10.1016/j.str.2023.05.002. Epub 2023 May 29.

Predicting protein model correctness in Coot using machine learning.使用机器学习预测 Coot 中蛋白质模型的正确性。

Acta Crystallogr D Struct Biol. 2020 Aug 1;76(Pt 8):713-723. doi: 10.1107/S2059798320009080. Epub 2020 Jul 27.

New tools in MolProbity validation: CaBLAM for CryoEM backbone, UnDowser to rethink "waters," and NGL Viewer to recapture online 3D graphics.MolProbity 验证的新工具：用于 CryoEM 骨架的 CaBLAM、重新思考“水”的 UnDowser 以及用于重新捕获在线 3D 图形的 NGL Viewer。

Protein Sci. 2020 Jan;29(1):315-329. doi: 10.1002/pro.3786. Epub 2019 Dec 10.

EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy.EMRinger：用于三维冷冻电子显微镜的侧链导向模型与图谱验证

Nat Methods. 2015 Oct;12(10):943-6. doi: 10.1038/nmeth.3541. Epub 2015 Aug 17.

Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions.用于将大分子模型构建和精修至电子冷冻显微镜重建的工具。

Acta Crystallogr D Biol Crystallogr. 2015 Jan 1;71(Pt 1):136-53. doi: 10.1107/S1399004714021683.

Current developments in Coot for macromolecular model building of Electron Cryo-microscopy and Crystallographic Data.Coot 在电子冷冻显微镜和晶体学数据的大分子模型构建方面的最新进展。

Protein Sci. 2020 Apr;29(4):1069-1078. doi: 10.1002/pro.3791. Epub 2020 Mar 2.

Real-space quantum-based refinement for cryo-EM: Q|R#3.基于实空间量子的低温电子显微镜重构：Q|R#3.

Acta Crystallogr D Struct Biol. 2020 Dec 1;76(Pt 12):1184-1191. doi: 10.1107/S2059798320013194. Epub 2020 Nov 19.

Model validation: local diagnosis, correction and when to quit.模型验证：局部诊断、修正和何时停止。

Acta Crystallogr D Struct Biol. 2018 Feb 1;74(Pt 2):132-142. doi: 10.1107/S2059798317009834.

引用本文的文献

AlphaFold-guided molecular replacement for solving challenging crystal structures.基于AlphaFold的分子置换法用于解析具有挑战性的晶体结构。

Acta Crystallogr D Struct Biol. 2025 Jan 1;81(Pt 1):4-21. doi: 10.1107/S2059798324011999.

本文引用的文献

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Likelihood-based docking of models into cryo-EM maps.基于似然的模型对接到冷冻电镜图谱中。

Acta Crystallogr D Struct Biol. 2023 Apr 1;79(Pt 4):281-289. doi: 10.1107/S2059798323001602. Epub 2023 Mar 15.

Likelihood-based signal and noise analysis for docking of models into cryo-EM maps.基于似然的信号和噪声分析用于将模型对接入冷冻电镜图谱。

Acta Crystallogr D Struct Biol. 2023 Apr 1;79(Pt 4):271-280. doi: 10.1107/S2059798323001596. Epub 2023 Mar 15.

Accelerating crystal structure determination with iterative AlphaFold prediction.利用迭代 AlphaFold 预测加速晶体结构测定。

Acta Crystallogr D Struct Biol. 2023 Mar 1;79(Pt 3):234-244. doi: 10.1107/S205979832300102X. Epub 2023 Feb 27.

Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling.使用 Rosetta、AlphaFold 和质谱共价标记进行蛋白质复合物预测。

Nat Commun. 2022 Dec 21;13(1):7846. doi: 10.1038/s41467-022-35593-8.

Improved AlphaFold modeling with implicit experimental information.利用隐式实验信息改进 AlphaFold 建模。

Nat Methods. 2022 Nov;19(11):1376-1382. doi: 10.1038/s41592-022-01645-6. Epub 2022 Oct 20.

ColabFold: making protein folding accessible to all.ColabFold：让蛋白质折叠变得人人可用。

Nat Methods. 2022 Jun;19(6):679-682. doi: 10.1038/s41592-022-01488-1. Epub 2022 May 30.

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.AlphaFold 蛋白质结构数据库：用高精度模型极大地扩展蛋白质序列空间的结构覆盖范围。

Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061.

AlphaFold and Implications for Intrinsically Disordered Proteins.AlphaFold 及其对无序蛋白质的影响。

J Mol Biol. 2021 Oct 1;433(20):167208. doi: 10.1016/j.jmb.2021.167208. Epub 2021 Aug 18.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

模型构建和稀疏数据区域精修趋势的好坏：过度拟合的有害形式与良好的新工具和预测。

The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献