Sun Mark G F, Kim Philip M
Department of Computer Science, University of Toronto, Toronto, Canada.
Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada.
PLoS Comput Biol. 2017 Aug 24;13(8):e1005722. doi: 10.1371/journal.pcbi.1005722. eCollection 2017 Aug.
Protein design remains an important problem in computational structural biology. Current computational protein design methods largely use physics-based methods, which make use of information from a single protein structure. This is despite the fact that multiple structures of many protein folds are now readily available in the PDB. While ensemble protein design methods can use multiple protein structures, they treat each structure independently. Here, we introduce a flexible backbone strategy, FlexiBaL-GP, which learns global protein backbone movements directly from multiple protein structures. FlexiBaL-GP uses the machine learning method of Gaussian Process Latent Variable Models to learn a lower dimensional representation of the protein coordinates that best represent backbone movements. These learned backbone movements are used to explore alternative protein backbones, while engineering a protein within a parallel tempered MCMC framework. Using the human ubiquitin-USP21 complex as a model we demonstrate that our design strategy outperforms current strategies for the interface design task of identifying tight binding ubiquitin variants for USP21.
蛋白质设计仍然是计算结构生物学中的一个重要问题。当前的计算蛋白质设计方法主要使用基于物理的方法,这些方法利用来自单个蛋白质结构的信息。尽管现在蛋白质数据银行(PDB)中许多蛋白质折叠的多个结构很容易获得,但情况依然如此。虽然多结构蛋白质设计方法可以使用多个蛋白质结构,但它们独立地处理每个结构。在这里,我们介绍一种灵活的主链策略FlexiBaL-GP,它直接从多个蛋白质结构中学习全局蛋白质主链运动。FlexiBaL-GP使用高斯过程潜在变量模型的机器学习方法来学习最能代表主链运动的蛋白质坐标的低维表示。这些学习到的主链运动用于探索替代的蛋白质主链,同时在并行回火马尔可夫链蒙特卡罗(MCMC)框架内设计蛋白质。以人泛素-USP21复合物为模型,我们证明了我们的设计策略在识别USP21紧密结合泛素变体的界面设计任务中优于当前策略。