基于罗塞塔的蛋白质设计方案，可收敛至天然序列。

A Rosetta-based protein design protocol converging to natural sequences.

机构信息

SISSA, Via Bonomea 265, Trieste, Italy.

Institute of Bioengineering, Ecole Polytechnique Federale de Lausanne, Lausanne CH-1015, Switzerland and Swiss Institute of Bioinformatics (SIB), Lausanne CH-1015, Switzerland.

出版信息

J Chem Phys. 2021 Feb 21;154(7):074114. doi: 10.1063/5.0039240.

DOI:10.1063/5.0039240

PMID:33607903

Abstract

Computational protein design has emerged as a powerful tool capable of identifying sequences compatible with pre-defined protein structures. The sequence design protocols, implemented in the Rosetta suite, have become widely used in the protein engineering community. To understand the strengths and limitations of the Rosetta design framework, we tested several design protocols on two distinct folds (SH3-1 and Ubiquitin). The sequence optimization, when started from native structures and natural sequences or polyvaline sequences, converges to sequences that are not recognized as belonging to the fold family of the target protein by standard bioinformatic tools, such as BLAST and Hmmer. The sequences generated from both starting conditions (native and polyvaline) are instead very similar to each other and recognized by Hmmer as belonging to the same "family." This demonstrates the capability of Rosetta to converge to similar sequences, even when sampling from distinct starting conditions, but, on the other hand, shows intrinsic inaccuracy of the scoring function that drifts toward sequences that lack identifiable natural sequence signatures. To address this problem, we developed a protocol embedding Rosetta Design simulations in a genetic algorithm, in which the sequence search is biased to converge to sequences that exist in nature. This protocol allows us to obtain sequences that have recognizable natural sequence signatures and, experimentally, the designed proteins are biochemically well behaved and thermodynamically stable.

摘要

计算蛋白质设计已经成为一种强大的工具，能够识别与预定义蛋白质结构兼容的序列。在 Rosetta 套件中实现的序列设计协议已在蛋白质工程界得到广泛应用。为了了解 Rosetta 设计框架的优缺点，我们在两种不同的折叠结构（SH3-1 和泛素）上测试了几种设计协议。当从天然结构和天然序列或多聚缬氨酸序列开始进行序列优化时，优化得到的序列不能被标准生物信息学工具（如 BLAST 和 Hmmer）识别为属于目标蛋白折叠家族的序列。从这两种起始条件（天然和多聚缬氨酸）生成的序列彼此非常相似，并且被 Hmmer 识别为属于相同的“家族”。这表明 Rosetta 能够收敛到相似的序列，即使从不同的起始条件进行采样，但另一方面也表明评分函数存在内在的不准确性，会向缺乏可识别的天然序列特征的序列漂移。为了解决这个问题，我们开发了一种协议，将 Rosetta Design 模拟嵌入遗传算法中，使序列搜索偏向于收敛到自然界中存在的序列。该协议使我们能够获得具有可识别的天然序列特征的序列，并且在实验中，设计的蛋白质具有良好的生物化学性质和热力学稳定性。

相似文献

A Rosetta-based protein design protocol converging to natural sequences.基于罗塞塔的蛋白质设计方案，可收敛至天然序列。

J Chem Phys. 2021 Feb 21;154(7):074114. doi: 10.1063/5.0039240.

Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network.罗塞塔：无国界医生组织：神经网络：提高多态计算蛋白质设计的性能的神经网络。

PLoS One. 2021 Aug 26;16(8):e0256691. doi: 10.1371/journal.pone.0256691. eCollection 2021.

Rosetta design with co-evolutionary information retains protein function.结合协同进化信息的罗塞塔设计保留了蛋白质功能。

PLoS Comput Biol. 2021 Jan 19;17(1):e1008568. doi: 10.1371/journal.pcbi.1008568. eCollection 2021 Jan.

Protein sequence design by conformational landscape optimization.通过构象景观优化进行蛋白质序列设计。

Proc Natl Acad Sci U S A. 2021 Mar 16;118(11). doi: 10.1073/pnas.2017228118.

Folding free energy function selects native-like protein sequences in the core but not on the surface.折叠自由能函数在核心区域选择类天然蛋白质序列，但在表面区域则不然。

Proc Natl Acad Sci U S A. 2002 Oct 15;99(21):13554-9. doi: 10.1073/pnas.212068599. Epub 2002 Oct 4.

Improved recognition of native-like protein structures using a family of designed sequences.利用一组设计序列提高对天然样蛋白质结构的识别。

Proc Natl Acad Sci U S A. 2002 Jan 22;99(2):691-6. doi: 10.1073/pnas.022408799. Epub 2002 Jan 8.

Improved design of stable and fast-folding model proteins.稳定且快速折叠的模型蛋白的改进设计。

Fold Des. 1996;1(3):221-30. doi: 10.1016/S1359-0278(96)00033-8.

Denatured state is critical in determining the properties of model proteins designed on different folds.变性状态对于确定基于不同折叠设计的模型蛋白质的性质至关重要。

Proteins. 2008 Feb 15;70(3):1047-55. doi: 10.1002/prot.21599.

Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.结合 Rosetta 与分子动力学（MD）：基于 MD 的蛋白质设计整体方法的基准测试。

J Struct Biol. 2018 Jul;203(1):54-61. doi: 10.1016/j.jsb.2018.02.004. Epub 2018 Feb 14.

RosettaAntibodyDesign (RAbD): A general framework for computational antibody design.罗塞塔抗体设计（RAbD）：一种通用的计算抗体设计框架。

PLoS Comput Biol. 2018 Apr 27;14(4):e1006112. doi: 10.1371/journal.pcbi.1006112. eCollection 2018 Apr.

引用本文的文献

Protein-protein interaction prediction with deep learning: A comprehensive review.基于深度学习的蛋白质-蛋白质相互作用预测：综述

Comput Struct Biotechnol J. 2022 Sep 19;20:5316-5341. doi: 10.1016/j.csbj.2022.08.070. eCollection 2022.

Frustration Dynamics and Electron-Transfer Reorganization Energies in Wild-Type and Mutant Azurins.野生型和突变天青蛋白中的挫折动力学和电子转移重组能。

J Am Chem Soc. 2022 Mar 9;144(9):4178-4185. doi: 10.1021/jacs.1c13454. Epub 2022 Feb 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于罗塞塔的蛋白质设计方案，可收敛至天然序列。

A Rosetta-based protein design protocol converging to natural sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献