一种用于增强概率性蛋白质设计的计算框架。

A computational framework to empower probabilistic protein design.

作者信息

Fromer Menachem, Yanover Chen

机构信息

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.

出版信息

Bioinformatics. 2008 Jul 1;24(13):i214-22. doi: 10.1093/bioinformatics/btn168.

DOI:10.1093/bioinformatics/btn168

PMID:18586717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2718646/

Abstract

MOTIVATION

The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult.

RESULTS

In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future.

摘要

动机

设计一种蛋白质以执行目标生物学功能的任务被称为蛋白质设计。一种常用的范式将这个功能设计问题视为一个结构问题，假设骨架是固定的。在概率性蛋白质设计中，位置氨基酸概率被用于创建一个随机的序列库，以便同时筛选其生物学活性。显然，某些概率分布的选择在产生功能性序列方面会更成功。然而，由于序列数量随蛋白质长度呈指数增长，对分布进行计算优化很困难。

结果

在本文中，我们开发了一个遵循结构范式的概率性蛋白质设计计算框架。我们使用序列自由能的玻尔兹曼分布来制定结构的序列分布。构建了相应的概率图形模型，并应用信念传播（BP）来计算边际氨基酸概率。我们在一个大型结构数据集上测试了这种方法，并证明了BP相对于先前方法的优越性。然而，由于BP获得的结果远非最优，我们使用高质量的实验数据对该范式进行了全面评估。我们证明，对于小规模子问题，BP获得的结果与对范式模型进行精确推理产生的结果相同。然而，定量分析表明，预测的分布与实验数据有显著差异。这些发现，连同我们在较小问题上使用BP观察到的出色性能，表明了该范式的潜在缺点。我们最后讨论了未来如何改进它。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51b8/2718646/40dfc9f154c4/btn168f1.jpg

相似文献

A computational framework to empower probabilistic protein design.

Bioinformatics. 2008 Jul 1;24(13):i214-22. doi: 10.1093/bioinformatics/btn168.

Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences.

PLoS Comput Biol. 2015 Jul 6;11(7):e1004300. doi: 10.1371/journal.pcbi.1004300. eCollection 2015 Jul.

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space.

Proteins. 2009 May 15;75(3):682-705. doi: 10.1002/prot.22280.

Design of multispecific protein sequences using probabilistic graphical modeling.

Proteins. 2010 Feb 15;78(3):530-47. doi: 10.1002/prot.22575.

Computational Protein Design Under a Given Backbone Structure with the ABACUS Statistical Energy Function.

Methods Mol Biol. 2017;1529:217-226. doi: 10.1007/978-1-4939-6637-0_10.

Algorithm for backrub motions in protein design.

Bioinformatics. 2008 Jul 1;24(13):i196-204. doi: 10.1093/bioinformatics/btn169.

A probabilistic approach to protein backbone tracing in electron density maps.

Bioinformatics. 2006 Jul 15;22(14):e81-9. doi: 10.1093/bioinformatics/btl252.

An evolutionary method for learning HMM structure: prediction of protein secondary structure.

BMC Bioinformatics. 2007 Sep 21;8:357. doi: 10.1186/1471-2105-8-357.

Energy estimation in protein design.

Curr Opin Struct Biol. 2002 Aug;12(4):441-6. doi: 10.1016/s0959-440x(02)00345-7.

Dead-end elimination with backbone flexibility.

Bioinformatics. 2007 Jul 1;23(13):i185-94. doi: 10.1093/bioinformatics/btm197.

引用本文的文献

De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.

J Chem Inf Model. 2020 Dec 28;60(12):5667-5681. doi: 10.1021/acs.jcim.0c00593. Epub 2020 Sep 30.

iCFN: an efficient exact algorithm for multistate protein design.

Bioinformatics. 2018 Sep 1;34(17):i811-i820. doi: 10.1093/bioinformatics/bty564.

BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces.

J Comput Biol. 2018 Jul;25(7):726-739. doi: 10.1089/cmb.2017.0267. Epub 2018 Mar 13.

A critical analysis of computational protein design with sparse residue interaction graphs.

PLoS Comput Biol. 2017 Mar 30;13(3):e1005346. doi: 10.1371/journal.pcbi.1005346. eCollection 2017 Mar.

Generative models of conformational dynamics.

Adv Exp Med Biol. 2014;805:87-105. doi: 10.1007/978-3-319-02970-2_4.

Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity.

PLoS Comput Biol. 2012;8(4):e1002477. doi: 10.1371/journal.pcbi.1002477. Epub 2012 Apr 19.

Tradeoff between stability and multispecificity in the design of promiscuous proteins.

PLoS Comput Biol. 2009 Dec;5(12):e1000627. doi: 10.1371/journal.pcbi.1000627. Epub 2009 Dec 24.

本文引用的文献

Minimizing and learning energy functions for side-chain prediction.

J Comput Biol. 2008 Sep;15(7):899-911. doi: 10.1089/cmb.2007.0158.

Free energy estimates of all-atom protein structures using generalized belief propagation.

J Comput Biol. 2008 Sep;15(7):755-66. doi: 10.1089/cmb.2007.0131.

Design of multi-specificity in protein interfaces.

PLoS Comput Biol. 2007 Aug;3(8):e164. doi: 10.1371/journal.pcbi.0030164. Epub 2007 Jul 5.

Exhaustive mutagenesis of six secondary active-site residues in Escherichia coli chorismate mutase shows the importance of hydrophobic side chains and a helix N-capping position for stability and catalysis.

Biochemistry. 2007 Jun 12;46(23):6883-91. doi: 10.1021/bi700215x. Epub 2007 May 17.

Computational protein design: a novel path to future protein drugs.

Curr Pharm Des. 2006;12(31):3973-97. doi: 10.2174/138161206778743655.

Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning.

J Biol Chem. 2006 Aug 4;281(31):22378-22385. doi: 10.1074/jbc.M603826200. Epub 2006 Jun 8.

Limitations of yeast surface display in engineering proteins of high thermostability.

Protein Eng Des Sel. 2006 May;19(5):211-7. doi: 10.1093/protein/gzl003. Epub 2006 Mar 14.

Progress in modeling of protein structures and interactions.

Science. 2005 Oct 28;310(5748):638-42. doi: 10.1126/science.1112160.

Statistical theory for protein ensembles with designed energy landscapes.

J Chem Phys. 2005 Oct 15;123(15):154908. doi: 10.1063/1.2062047.

A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme.

J Comput Biol. 2005 Jul-Aug;12(6):740-61. doi: 10.1089/cmb.2005.12.740.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于增强概率性蛋白质设计的计算框架。

A computational framework to empower probabilistic protein design.

作者信息

Fromer Menachem, Yanover Chen

机构信息

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.

出版信息

Bioinformatics. 2008 Jul 1;24(13):i214-22. doi: 10.1093/bioinformatics/btn168.

DOI:10.1093/bioinformatics/btn168

PMID:18586717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2718646/

Abstract

MOTIVATION

RESULTS

摘要

一种用于增强概率性蛋白质设计的计算框架。

A computational framework to empower probabilistic protein design.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种用于增强概率性蛋白质设计的计算框架。

A computational framework to empower probabilistic protein design.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献