Suppr超能文献

基于罗塞塔的蛋白质设计方案,可收敛至天然序列。

A Rosetta-based protein design protocol converging to natural sequences.

机构信息

SISSA, Via Bonomea 265, Trieste, Italy.

Institute of Bioengineering, Ecole Polytechnique Federale de Lausanne, Lausanne CH-1015, Switzerland and Swiss Institute of Bioinformatics (SIB), Lausanne CH-1015, Switzerland.

出版信息

J Chem Phys. 2021 Feb 21;154(7):074114. doi: 10.1063/5.0039240.

Abstract

Computational protein design has emerged as a powerful tool capable of identifying sequences compatible with pre-defined protein structures. The sequence design protocols, implemented in the Rosetta suite, have become widely used in the protein engineering community. To understand the strengths and limitations of the Rosetta design framework, we tested several design protocols on two distinct folds (SH3-1 and Ubiquitin). The sequence optimization, when started from native structures and natural sequences or polyvaline sequences, converges to sequences that are not recognized as belonging to the fold family of the target protein by standard bioinformatic tools, such as BLAST and Hmmer. The sequences generated from both starting conditions (native and polyvaline) are instead very similar to each other and recognized by Hmmer as belonging to the same "family." This demonstrates the capability of Rosetta to converge to similar sequences, even when sampling from distinct starting conditions, but, on the other hand, shows intrinsic inaccuracy of the scoring function that drifts toward sequences that lack identifiable natural sequence signatures. To address this problem, we developed a protocol embedding Rosetta Design simulations in a genetic algorithm, in which the sequence search is biased to converge to sequences that exist in nature. This protocol allows us to obtain sequences that have recognizable natural sequence signatures and, experimentally, the designed proteins are biochemically well behaved and thermodynamically stable.

摘要

计算蛋白质设计已经成为一种强大的工具,能够识别与预定义蛋白质结构兼容的序列。在 Rosetta 套件中实现的序列设计协议已在蛋白质工程界得到广泛应用。为了了解 Rosetta 设计框架的优缺点,我们在两种不同的折叠结构(SH3-1 和泛素)上测试了几种设计协议。当从天然结构和天然序列或多聚缬氨酸序列开始进行序列优化时,优化得到的序列不能被标准生物信息学工具(如 BLAST 和 Hmmer)识别为属于目标蛋白折叠家族的序列。从这两种起始条件(天然和多聚缬氨酸)生成的序列彼此非常相似,并且被 Hmmer 识别为属于相同的“家族”。这表明 Rosetta 能够收敛到相似的序列,即使从不同的起始条件进行采样,但另一方面也表明评分函数存在内在的不准确性,会向缺乏可识别的天然序列特征的序列漂移。为了解决这个问题,我们开发了一种协议,将 Rosetta Design 模拟嵌入遗传算法中,使序列搜索偏向于收敛到自然界中存在的序列。该协议使我们能够获得具有可识别的天然序列特征的序列,并且在实验中,设计的蛋白质具有良好的生物化学性质和热力学稳定性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验