Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands.
San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, California, 92093-0505.
Protein Sci. 2018 Mar;27(3):798-808. doi: 10.1002/pro.3353. Epub 2017 Dec 8.
The Protein Data Bank (PDB) is the global archive for structural information on macromolecules, and a popular resource for researchers, teachers, and students, amassing more than one million unique users each year. Crystallographic structure models in the PDB (more than 100,000 entries) are optimized against the crystal diffraction data and geometrical restraints. This process of crystallographic refinement typically ignored hydrogen bond (H-bond) distances as a source of information. However, H-bond restraints can improve structures at low resolution where diffraction data are limited. To improve low-resolution structure refinement, we present methods for deriving H-bond information either globally from well-refined high-resolution structures from the PDB-REDO databank, or specifically from on-the-fly constructed sets of homologous high-resolution structures. Refinement incorporating HOmology DErived Restraints (HODER), improves geometrical quality and the fit to the diffraction data for many low-resolution structures. To make these improvements readily available to the general public, we applied our new algorithms to all crystallographic structures in the PDB: using massively parallel computing, we constructed a new instance of the PDB-REDO databank (https://pdb-redo.eu). This resource is useful for researchers to gain insight on individual structures, on specific protein families (as we demonstrate with examples), and on general features of protein structure using data mining approaches on a uniformly treated dataset.
蛋白质数据库 (PDB) 是大分子结构信息的全球档案库,也是研究人员、教师和学生的热门资源,每年吸引超过一百万的独特用户。PDB 中的晶体学结构模型(超过 100,000 个条目)是根据晶体衍射数据和几何约束进行优化的。这个晶体学精修过程通常忽略氢键 (H-bond) 距离作为信息源。然而,H-bond 约束可以在衍射数据有限的低分辨率下改善结构。为了改善低分辨率结构精修,我们提出了从 PDB-REDO 数据库中的精修高分辨率结构全局或特定于即时构建的同源高分辨率结构集获取 H-bond 信息的方法。包含同源结构导出约束(HODER)的精修可改善许多低分辨率结构的几何质量和对衍射数据的拟合。为了使这些改进易于为公众所用,我们将我们的新算法应用于 PDB 中的所有晶体学结构:使用大规模并行计算,我们构建了 PDB-REDO 数据库的新实例(https://pdb-redo.eu)。该资源可用于研究人员深入了解单个结构、特定蛋白质家族(如我们通过示例展示的那样)以及使用数据挖掘方法在统一处理的数据集上蛋白质结构的一般特征。