School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
Proteins. 2010 Feb 15;78(3):530-47. doi: 10.1002/prot.22575.
In nature, proteins partake in numerous protein- protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.
在自然界中,蛋白质参与了许多介导其功能的蛋白质-蛋白质相互作用。此外,已经表明蛋白质在多种结构中具有物理稳定性,这些结构是由细胞条件、小配体或共价修饰诱导的。了解蛋白质序列如何在原子水平上实现这种结构混杂性是药物设计管道的一个基本步骤,也是蛋白质物理中的一个关键问题。一种研究这个主题的方法是计算预测与多种状态(即多个目标结构或与不同配体结合)兼容的蛋白质序列。设计此类蛋白质的目标被称为多特异性蛋白质设计。我们开发了一种新颖的计算框架,以高效、准确地进行多特异性蛋白质设计。该框架利用概率图形建模的最新进展来预测在多个目标状态下具有低能量的序列。此外,它还专门生成与这些目标状态兼容的位置氨基酸概率分布。这些分布可用于随机偏向高通量实验序列筛选技术,如噬菌体展示,从而为阐明天然蛋白质的多特异性和合成具有特定功能的新型蛋白质提供了另一种途径。我们证明了这种多特异性设计技术在更好地恢复类似于数百万年进化产生的氨基酸序列多样性方面的有效性。然后,我们比较了低能量集合和氨基酸分布的预测方法,并证明了它们在提供更稳健的蛋白质设计预测方面的互补性。