Yim Jason, Campbell Andrew, Mathieu Emile, Foong Andrew Y K, Gastegger Michael, Jiménez-Luna José, Lewis Sarah, Satorras Victor Garcia, Veeling Bastiaan S, Noé Frank, Barzilay Regina, Jaakkola Tommi S
ArXiv. 2024 Jul 18:arXiv:2401.04082v2.
Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: https://github.com/microsoft/protein-frame-flow.
蛋白质设计通常始于对某个基序所需功能的了解,而基序支架旨在围绕该基序构建功能性蛋白质。最近,生成模型在为一系列基序设计支架方面取得了突破性成功。然而,生成的支架往往缺乏结构多样性,这可能会阻碍湿实验室验证的成功。在这项工作中,我们扩展了FrameFlow(一种用于蛋白质主链生成的SE(3)流匹配模型),以通过两种互补方法进行基序支架构建。第一种是基序摊销,其中使用数据增强策略将基序作为输入对FrameFlow进行训练。第二种是基序引导,它使用来自FrameFlow的条件分数估计进行支架构建,无需额外训练。在24个具有生物学意义的基序基准上,我们表明我们的方法与现有技术相比,可设计和独特的基序支架数量多出2.5倍。代码:https://github.com/microsoft/protein-frame-flow