Suppr超能文献

利用深度学习预测揭示了 PDB 沉积物中大量的注册错误。

Using deep-learning predictions reveals a large number of register errors in PDB depositions.

机构信息

Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom.

European Molecular Biology Laboratory, Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany.

出版信息

IUCrJ. 2024 Nov 1;11(Pt 6):938-950. doi: 10.1107/S2052252524009114.

Abstract

The accuracy of the information in the Protein Data Bank (PDB) is of great importance for the myriad downstream applications that make use of protein structural information. Despite best efforts, the occasional introduction of errors is inevitable, especially where the experimental data are of limited resolution. A novel protein structure validation approach based on spotting inconsistencies between the residue contacts and distances observed in a structural model and those computationally predicted by methods such as AlphaFold2 has previously been established. It is particularly well suited to the detection of register errors. Importantly, this new approach is orthogonal to traditional methods based on stereochemistry or map-model agreement, and is resolution independent. Here, thousands of likely register errors are identified by scanning 3-5 Å resolution structures in the PDB. Unlike most methods, the application of this approach yields suggested corrections to the register of affected regions, which it is shown, even by limited implementation, lead to improved refinement statistics in the vast majority of cases. A few limitations and confounding factors such as fold-switching proteins are characterized, but this approach is expected to have broad application in spotting potential issues in current accessions and, through its implementation and distribution in CCP4, helping to ensure the accuracy of future depositions.

摘要

蛋白质数据库(PDB)中的信息准确性对于众多利用蛋白质结构信息的下游应用至关重要。尽管已经付出了最大努力,但偶尔引入错误是不可避免的,尤其是在实验数据分辨率有限的情况下。先前已经建立了一种基于发现结构模型中观察到的残基接触和距离与诸如 AlphaFold2 等方法计算预测的残基接触和距离之间不一致的新型蛋白质结构验证方法。它特别适合检测注册错误。重要的是,这种新方法与基于立体化学或图谱-模型一致性的传统方法正交,并且不依赖于分辨率。在这里,通过扫描 PDB 中 3-5 Å 分辨率的结构来识别数千个可能的注册错误。与大多数方法不同,该方法的应用会对受影响区域的注册进行建议性修正,即使仅进行有限的实施,也会导致绝大多数情况下改进精修统计数据。该方法还对一些局限性和混杂因素(如折叠开关蛋白)进行了特征描述,但预计该方法将广泛应用于发现当前访问中的潜在问题,并通过在 CCP4 中的实施和分发,有助于确保未来存储的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44ee/11533997/99668bbd0fda/m-11-00938-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验