Lynch Miranda L, Dudek Max F, Bowman Sarah E J
High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA.
University of Pittsburgh, Pittsburgh, PA 15261, USA.
Patterns (N Y). 2020 Jul 10;1(4). doi: 10.1016/j.patter.2020.100024. Epub 2020 Apr 28.
Nearly 90% of structural models in the Protein Data Bank (PDB), the central resource worldwide for three-dimensional structural information, are currently derived from macromolecular crystallography (MX). A major bottleneck in determining MX structures is finding conditions in which a biomolecule will crystallize. Here, we present a searchable database of the chemicals associated with successful crystallization experiments from the PDB. We use these data to examine the relationship between protein secondary structure and average molecular weight of polyethylene glycol and to investigate patterns in crystallization conditions. Our analyses reveal striking patterns of both redundancy of chemical compositions in crystallization experiments and extreme sparsity of specific chemical combinations, underscoring the challenges faced in generating predictive models for optimal crystallization experiments.
蛋白质数据库(PDB)是全球三维结构信息的核心资源,目前近90%的结构模型来自大分子晶体学(MX)。确定MX结构的一个主要瓶颈是找到生物分子能够结晶的条件。在此,我们展示了一个可搜索的数据库,该数据库包含来自PDB的与成功结晶实验相关的化学物质。我们利用这些数据来研究蛋白质二级结构与聚乙二醇平均分子量之间的关系,并探究结晶条件的模式。我们的分析揭示了结晶实验中化学成分冗余和特定化学组合极度稀疏的显著模式,突出了生成最佳结晶实验预测模型所面临的挑战。