Synchrotron Radiation Research Section, Macromolecular Crystallography Laboratory, NCI , Argonne National Laboratory, Argonne, IL 60439, USA.
Protein Structure Section, Macromolecular Crystallography Laboratory, NCI at Frederick , Frederick, MD 21702, USA.
IUCrJ. 2014 Apr 14;1(Pt 3):179-93. doi: 10.1107/S2052252514005442. eCollection 2014 May 1.
Whereas the vast majority of the more than 85 000 crystal structures of macromolecules currently deposited in the Protein Data Bank are of high quality, some suffer from a variety of imperfections. Although this fact has been pointed out in the past, it is still worth periodic updates so that the metadata obtained by global analysis of the available crystal structures, as well as the utilization of the individual structures for tasks such as drug design, should be based on only the most reliable data. Here, selected abnormal deposited structures have been analysed based on the Bayesian reasoning that the correctness of a model must be judged against both the primary evidence as well as prior knowledge. These structures, as well as information gained from the corresponding publications (if available), have emphasized some of the most prevalent types of common problems. The errors are often perfect illustrations of the nature of human cognition, which is frequently influenced by preconceptions that may lead to fanciful results in the absence of proper validation. Common errors can be traced to negligence and a lack of rigorous verification of the models against electron density, creation of non-parsimonious models, generation of improbable numbers, application of incorrect symmetry, illogical presentation of the results, or violation of the rules of chemistry and physics. Paying more attention to such problems, not only in the final validation stages but during the structure-determination process as well, is necessary not only in order to maintain the highest possible quality of the structural repositories and databases but most of all to provide a solid basis for subsequent studies, including large-scale data-mining projects. For many scientists PDB deposition is a rather infrequent event, so the need for proper training and supervision is emphasized, as well as the need for constant alertness of reason and critical judgment as absolutely necessary safeguarding measures against such problems. Ways of identifying more problematic structures are suggested so that their users may be properly alerted to their possible shortcomings.
虽然目前在蛋白质数据库(PDB)中储存的 85000 多个高分子晶体结构中,绝大多数的结构质量都很高,但其中一些结构存在多种缺陷。尽管这一事实在过去已经被指出,但仍值得定期更新,以便通过对现有晶体结构进行全局分析所获得的元数据,以及利用个别结构进行药物设计等任务,都基于最可靠的数据。在这里,基于贝叶斯推理,对选定的异常储存结构进行了分析,即模型的正确性必须根据原始证据和先验知识来判断。这些结构,以及从相应出版物中获得的信息(如果有),强调了一些最常见的常见问题类型。这些错误常常是人类认知本质的完美例证,人类认知常常受到先入为主观念的影响,如果没有适当的验证,这些观念可能会导致异想天开的结果。常见的错误可以追溯到对模型的疏忽,以及对电子密度、创建非简约模型、生成不合理数字、应用不正确的对称、不合逻辑的结果呈现或违反化学和物理规则的验证不足。不仅在最终验证阶段,而且在结构确定过程中,更加关注这些问题,不仅对于维护结构存储库和数据库的最高质量是必要的,而且对于为随后的研究,包括大规模的数据挖掘项目提供坚实的基础也是必要的。对于许多科学家来说,PDB 结构的提交是一个相当不频繁的事件,因此强调了适当的培训和监督的必要性,以及保持理智和批判性判断的警觉性的必要性,这是防止此类问题的绝对必要的保障措施。还提出了识别更具问题结构的方法,以便其用户能够对其可能的缺陷进行适当的提醒。