• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

避免沉积大分子结构中的错误:高效数据挖掘的障碍。

Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining.

机构信息

Synchrotron Radiation Research Section, Macromolecular Crystallography Laboratory, NCI , Argonne National Laboratory, Argonne, IL 60439, USA.

Protein Structure Section, Macromolecular Crystallography Laboratory, NCI at Frederick , Frederick, MD 21702, USA.

出版信息

IUCrJ. 2014 Apr 14;1(Pt 3):179-93. doi: 10.1107/S2052252514005442. eCollection 2014 May 1.

DOI:10.1107/S2052252514005442
PMID:25075337
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4086436/
Abstract

Whereas the vast majority of the more than 85 000 crystal structures of macromolecules currently deposited in the Protein Data Bank are of high quality, some suffer from a variety of imperfections. Although this fact has been pointed out in the past, it is still worth periodic updates so that the metadata obtained by global analysis of the available crystal structures, as well as the utilization of the individual structures for tasks such as drug design, should be based on only the most reliable data. Here, selected abnormal deposited structures have been analysed based on the Bayesian reasoning that the correctness of a model must be judged against both the primary evidence as well as prior knowledge. These structures, as well as information gained from the corresponding publications (if available), have emphasized some of the most prevalent types of common problems. The errors are often perfect illustrations of the nature of human cognition, which is frequently influenced by preconceptions that may lead to fanciful results in the absence of proper validation. Common errors can be traced to negligence and a lack of rigorous verification of the models against electron density, creation of non-parsimonious models, generation of improbable numbers, application of incorrect symmetry, illogical presentation of the results, or violation of the rules of chemistry and physics. Paying more attention to such problems, not only in the final validation stages but during the structure-determination process as well, is necessary not only in order to maintain the highest possible quality of the structural repositories and databases but most of all to provide a solid basis for subsequent studies, including large-scale data-mining projects. For many scientists PDB deposition is a rather infrequent event, so the need for proper training and supervision is emphasized, as well as the need for constant alertness of reason and critical judgment as absolutely necessary safeguarding measures against such problems. Ways of identifying more problematic structures are suggested so that their users may be properly alerted to their possible shortcomings.

摘要

虽然目前在蛋白质数据库(PDB)中储存的 85000 多个高分子晶体结构中,绝大多数的结构质量都很高,但其中一些结构存在多种缺陷。尽管这一事实在过去已经被指出,但仍值得定期更新,以便通过对现有晶体结构进行全局分析所获得的元数据,以及利用个别结构进行药物设计等任务,都基于最可靠的数据。在这里,基于贝叶斯推理,对选定的异常储存结构进行了分析,即模型的正确性必须根据原始证据和先验知识来判断。这些结构,以及从相应出版物中获得的信息(如果有),强调了一些最常见的常见问题类型。这些错误常常是人类认知本质的完美例证,人类认知常常受到先入为主观念的影响,如果没有适当的验证,这些观念可能会导致异想天开的结果。常见的错误可以追溯到对模型的疏忽,以及对电子密度、创建非简约模型、生成不合理数字、应用不正确的对称、不合逻辑的结果呈现或违反化学和物理规则的验证不足。不仅在最终验证阶段,而且在结构确定过程中,更加关注这些问题,不仅对于维护结构存储库和数据库的最高质量是必要的,而且对于为随后的研究,包括大规模的数据挖掘项目提供坚实的基础也是必要的。对于许多科学家来说,PDB 结构的提交是一个相当不频繁的事件,因此强调了适当的培训和监督的必要性,以及保持理智和批判性判断的警觉性的必要性,这是防止此类问题的绝对必要的保障措施。还提出了识别更具问题结构的方法,以便其用户能够对其可能的缺陷进行适当的提醒。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/96f59ca81e56/m-01-00179-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/3b147c8f1ef7/m-01-00179-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/a13ae4dc1007/m-01-00179-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/e68fc10e4791/m-01-00179-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/49e68dfdd206/m-01-00179-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/b9524b3fb615/m-01-00179-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/ac4e260f7683/m-01-00179-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/170c0a2f177e/m-01-00179-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/76e7a0bed886/m-01-00179-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/31ba7ad2db2d/m-01-00179-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/96f59ca81e56/m-01-00179-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/3b147c8f1ef7/m-01-00179-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/a13ae4dc1007/m-01-00179-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/e68fc10e4791/m-01-00179-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/49e68dfdd206/m-01-00179-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/b9524b3fb615/m-01-00179-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/ac4e260f7683/m-01-00179-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/170c0a2f177e/m-01-00179-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/76e7a0bed886/m-01-00179-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/31ba7ad2db2d/m-01-00179-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77e9/4086436/96f59ca81e56/m-01-00179-fig10.jpg

相似文献

1
Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining.避免沉积大分子结构中的错误:高效数据挖掘的障碍。
IUCrJ. 2014 Apr 14;1(Pt 3):179-93. doi: 10.1107/S2052252514005442. eCollection 2014 May 1.
2
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
5
Protein crystallography for aspiring crystallographers or how to avoid pitfalls and traps in macromolecular structure determination.有志于从事晶体学研究人员的蛋白质晶体学:如何避免在确定大分子结构过程中的陷阱。
FEBS J. 2013 Nov;280(22):5705-36. doi: 10.1111/febs.12495. Epub 2013 Sep 18.
6
Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.蛋白质数据库(PDB):单一的全球大分子结构存档库。
Methods Mol Biol. 2017;1607:627-641. doi: 10.1007/978-1-4939-7000-1_26.
7
Stereochemistry and Validation of Macromolecular Structures.大分子结构的立体化学与验证
Methods Mol Biol. 2017;1607:595-610. doi: 10.1007/978-1-4939-7000-1_24.
8
Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures.面向非晶体学家的蛋白质晶体学,或如何从已发表的大分子结构中获取最佳(而非更多)信息。
FEBS J. 2008 Jan;275(1):1-21. doi: 10.1111/j.1742-4658.2007.06178.x. Epub 2007 Nov 23.
9
Macromolecular structures: Quality assessment and biological interpretation.大分子结构:质量评估与生物学解读。
IUBMB Life. 2017 Aug;69(8):563-571. doi: 10.1002/iub.1640. Epub 2017 May 11.
10
The quality and validation of structures from structural genomics.结构基因组学中结构的质量与验证
Methods Mol Biol. 2014;1091:297-314. doi: 10.1007/978-1-62703-691-7_21.

引用本文的文献

1
Duplicate entries in the Protein Data Bank: how to detect and handle them.蛋白质数据库中的重复条目:如何检测与处理
Acta Crystallogr D Struct Biol. 2025 Apr 1;81(Pt 4):170-180. doi: 10.1107/S2059798325001883. Epub 2025 Mar 8.
2
Principles of ion binding to RNA inferred from the analysis of a 1.55 Å resolution bacterial ribosome structure - Part I: Mg2.从分辨率为1.55埃的细菌核糖体结构分析推断出的离子与RNA结合的原理 - 第一部分:Mg2+
Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1148.
3
CheckMyMetal (CMM): validating metal-binding sites in X-ray and cryo-EM data.

本文引用的文献

1
Weak data do not make a free lunch, only a cheap meal.数据薄弱不会带来免费的午餐,只会有一顿便宜的餐食。
Acta Crystallogr D Biol Crystallogr. 2014 Feb;70(Pt 2):253-60. doi: 10.1107/S1399004713026680. Epub 2014 Jan 17.
2
A challenging interpretation of a hexagonally layered protein structure.对六边形层状蛋白质结构的一种具有挑战性的解读。
Acta Crystallogr D Biol Crystallogr. 2014 Jan;70(Pt 1):203-8. doi: 10.1107/S139900471302422X. Epub 2013 Dec 24.
3
The future of crystallography in drug discovery.药物发现中晶体学的未来。
CheckMyMetal(CMM):验证 X 射线和冷冻电镜数据中的金属结合位点。
IUCrJ. 2024 Sep 1;11(Pt 5):871-877. doi: 10.1107/S2052252524007073.
4
Rational design of monomeric IL37 variants guided by stability and dynamical analyses of IL37 dimers.基于IL37二聚体稳定性和动力学分析的单体IL37变体的合理设计。
Comput Struct Biotechnol J. 2024 Apr 22;23:1854-1863. doi: 10.1016/j.csbj.2024.04.037. eCollection 2024 Dec.
5
The impact of molecular variants, crystallization conditions and the space group on ligand-protein complexes: a case study on bacterial phosphotriesterase.分子变体、结晶条件和空间群对配体-蛋白复合物的影响:以细菌磷酸三酯酶为例。
Acta Crystallogr D Struct Biol. 2023 Nov 1;79(Pt 11):992-1009. doi: 10.1107/S2059798323007672. Epub 2023 Oct 20.
6
Continuous Validation Across Macromolecular Structure Determination Process.大分子结构测定过程中的持续验证
Nihon Kessho Gakkaishi. 2023 Feb 28;65(1):10-16. doi: 10.5940/jcrsj.65.10. Epub 2023 Mar 8.
7
Lone Pair…π Contacts and Structure Signatures of r(UNCG) Tetraloops, Z-Turns, and Z-Steps: A WebFR3D Survey.孤对电子…π 键接触和 r(UNCG)四链环、Z 型转角和 Z 型梯的结构特征:WebFR3D 调查。
Molecules. 2022 Jul 7;27(14):4365. doi: 10.3390/molecules27144365.
8
Vagabond: bond-based parametrization reduces overfitting for refinement of proteins.漂泊者:基于键的参数化可减少过度拟合,从而改进蛋白质的精修。
Acta Crystallogr D Struct Biol. 2021 Apr 1;77(Pt 4):424-437. doi: 10.1107/S2059798321000826. Epub 2021 Mar 30.
9
CERES: a cryo-EM re-refinement system for continuous improvement of deposited models.CERES:一个低温电子显微镜再重构系统,用于不断改进已存入的模型。
Acta Crystallogr D Struct Biol. 2021 Jan 1;77(Pt 1):48-61. doi: 10.1107/S2059798320015879.
10
Deflating the RNA Mg bubble. Stereochemistry to the rescue!消除RNA镁泡。立体化学来帮忙!
RNA. 2020 Dec 2;27(3):243-52. doi: 10.1261/rna.076067.120.
Expert Opin Drug Discov. 2014 Feb;9(2):125-37. doi: 10.1517/17460441.2014.872623. Epub 2013 Dec 28.
4
Validation of metal-binding sites in macromolecular structures with the CheckMyMetal web server.利用 CheckMyMetal 网络服务器验证大分子结构中的金属结合位点。
Nat Protoc. 2014 Jan;9(1):156-70. doi: 10.1038/nprot.2013.172. Epub 2013 Dec 19.
5
The quality and validation of structures from structural genomics.结构基因组学中结构的质量与验证
Methods Mol Biol. 2014;1091:297-314. doi: 10.1007/978-1-62703-691-7_21.
6
Peptide-based inhibitors of Plk1 polo-box domain containing mono-anionic phosphothreonine esters and their pivaloyloxymethyl prodrugs.含单阴离子磷酸苏氨酸酯的Plk1 polo盒结构域的肽基抑制剂及其新戊酰氧甲基前药。
Chem Biol. 2013 Oct 24;20(10):1255-64. doi: 10.1016/j.chembiol.2013.09.005. Epub 2013 Oct 10.
7
On the propagation of errors.论误差的传播
Acta Crystallogr D Biol Crystallogr. 2013 Oct;69(Pt 10):1865-6. doi: 10.1107/S090744491301528X. Epub 2013 Sep 20.
8
Protein crystallography for aspiring crystallographers or how to avoid pitfalls and traps in macromolecular structure determination.有志于从事晶体学研究人员的蛋白质晶体学:如何避免在确定大分子结构过程中的陷阱。
FEBS J. 2013 Nov;280(22):5705-36. doi: 10.1111/febs.12495. Epub 2013 Sep 18.
9
Recommendations of the wwPDB NMR Validation Task Force.wwPDB NMR 验证工作组的建议。
Structure. 2013 Sep 3;21(9):1563-70. doi: 10.1016/j.str.2013.07.021.
10
Better models by discarding data?通过丢弃数据来获得更好的模型?
Acta Crystallogr D Biol Crystallogr. 2013 Jul;69(Pt 7):1215-22. doi: 10.1107/S0907444913001121. Epub 2013 Jun 15.