Wlodawer Alexander, Dauter Zbigniew, Rubach Pawel, Minor Wladek, Jaskolski Mariusz, Jiang Ziqiu, Jeffcott William, Anosova Olga, Kurlin Vitaliy
Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA.
Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA.
Acta Crystallogr D Struct Biol. 2025 Apr 1;81(Pt 4):170-180. doi: 10.1107/S2059798325001883. Epub 2025 Mar 8.
A global analysis of protein crystal structures in the Protein Data Bank (PDB) using a newly developed computational approach reveals many pairs with (nearly) identical main-chain coordinates. Such cases are identified and analyzed, showing that duplication is possible since the PDB does not currently have tools or mechanisms that would detect potentially duplicate submissions. Some duplicated entries represent modeling efforts of ligand binding that masquerade as experimentally determined structures. We propose that duplicate entries should either be obsoleted by the PDB or, as a minimum, marked with a clear `CAVEAT' record that would alert potential users to the presence of such problems. We also suggest that using a tool for verifying the uniqueness of the deposited structure, such as that presented in this work, should become part of the routine validation procedure for new depositions.
使用一种新开发的计算方法对蛋白质数据库(PDB)中的蛋白质晶体结构进行全局分析,发现许多(几乎)具有相同主链坐标的配对。此类情况已被识别和分析,结果表明存在重复提交的可能性,因为PDB目前没有能够检测潜在重复提交的工具或机制。一些重复条目代表配体结合的建模成果,却伪装成实验确定的结构。我们建议PDB要么废弃重复条目,要么至少标记一条明确的“注意事项”记录,以提醒潜在用户存在此类问题。我们还建议,使用如本文所介绍的用于验证所提交结构唯一性的工具,应成为新提交条目的常规验证程序的一部分。