Fromer Menachem, Yanover Chen
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
Bioinformatics. 2008 Jul 1;24(13):i214-22. doi: 10.1093/bioinformatics/btn168.
The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult.
In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future.
设计一种蛋白质以执行目标生物学功能的任务被称为蛋白质设计。一种常用的范式将这个功能设计问题视为一个结构问题,假设骨架是固定的。在概率性蛋白质设计中,位置氨基酸概率被用于创建一个随机的序列库,以便同时筛选其生物学活性。显然,某些概率分布的选择在产生功能性序列方面会更成功。然而,由于序列数量随蛋白质长度呈指数增长,对分布进行计算优化很困难。
在本文中,我们开发了一个遵循结构范式的概率性蛋白质设计计算框架。我们使用序列自由能的玻尔兹曼分布来制定结构的序列分布。构建了相应的概率图形模型,并应用信念传播(BP)来计算边际氨基酸概率。我们在一个大型结构数据集上测试了这种方法,并证明了BP相对于先前方法的优越性。然而,由于BP获得的结果远非最优,我们使用高质量的实验数据对该范式进行了全面评估。我们证明,对于小规模子问题,BP获得的结果与对范式模型进行精确推理产生的结果相同。然而,定量分析表明,预测的分布与实验数据有显著差异。这些发现,连同我们在较小问题上使用BP观察到的出色性能,表明了该范式的潜在缺点。我们最后讨论了未来如何改进它。