Department of Chemistry, University of Chicago, Chicago, IL 60637, USA.
Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL 60637, USA.
Cell Syst. 2024 Aug 21;15(8):725-737.e7. doi: 10.1016/j.cels.2024.07.005. Epub 2024 Aug 5.
Evolution-based deep generative models represent an exciting direction in understanding and designing proteins. An open question is whether such models can learn specialized functional constraints that control fitness in specific biological contexts. Here, we examine the ability of generative models to produce synthetic versions of Src-homology 3 (SH3) domains that mediate signaling in the Sho1 osmotic stress response pathway of yeast. We show that a variational autoencoder (VAE) model produces artificial sequences that experimentally recapitulate the function of natural SH3 domains. More generally, the model organizes all fungal SH3 domains such that locality in the model latent space (but not simply locality in sequence space) enriches the design of synthetic orthologs and exposes non-obvious amino acid constraints distributed near and far from the SH3 ligand-binding site. The ability of generative models to design ortholog-like functions in vivo opens new avenues for engineering protein function in specific cellular contexts and environments.
基于进化的深度生成模型代表了理解和设计蛋白质的一个令人兴奋的方向。一个悬而未决的问题是,这样的模型是否能够学习专门的功能约束,以控制特定生物背景下的适应性。在这里,我们研究了生成模型在产生介导酵母 Sho1 渗透胁迫反应途径信号的Src-homology 3 (SH3) 结构域的合成版本方面的能力。我们表明,变分自动编码器 (VAE) 模型产生的人工序列在实验上再现了天然 SH3 结构域的功能。更一般地说,该模型组织了所有真菌 SH3 结构域,使得模型潜在空间中的局部性(而不仅仅是序列空间中的局部性)丰富了合成直系同源物的设计,并揭示了分布在 SH3 配体结合位点附近和远处的非明显氨基酸约束。生成模型在体内设计类似直系同源物功能的能力为在特定细胞环境和条件下设计蛋白质功能开辟了新途径。