Fusco Diana, Barnum Timothy J, Bruno Andrew E, Luft Joseph R, Snell Edward H, Mukherjee Sayan, Charbonneau Patrick
Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America; Department of Chemistry, Duke University, Durham, North Carolina, United States of America.
Department of Chemistry, Duke University, Durham, North Carolina, United States of America; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
PLoS One. 2014 Jul 2;9(7):e101123. doi: 10.1371/journal.pone.0101123. eCollection 2014.
X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.
X射线晶体学是获取生物大分子原子尺度信息的主要方法。尽管该技术取得了成功,但获得衍射良好的晶体仍然是从蛋白质到结构这一过程的关键限制因素。在实际操作中,结晶过程是通过基于知识的经验主义进行的。由于涉及大量变量,对物理化学的更好理解仍然难以捉摸,因此几乎没有可用于系统识别促进结晶的溶液条件的指导方法。为了帮助确定大分子性质与其结晶倾向之间的关系,我们利用东北结构基因组学联盟提供的182种蛋白质样本训练了统计模型。高斯过程能够捕捉线性统计模型无法触及的趋势,它区分了驱动结晶的两种主要物理化学机制。一种机制的特征是侧链熵水平较低,这在文献中已有广泛报道。另一种机制则识别出了在结晶背景下先前未描述过的特定静电相互作用。由于可以从晶体接触以及导致成功结晶的溶液条件中收集到两种不同机制的证据,该模型为基于部分结构信息优化结晶筛选提供了未来的途径。因此,结晶数据的可用性以及通过先进统计模型分析的结构结果可能会引导大分子结晶走向更合理的基础。