无监督的蛋白质和抗体复合物的进化与结构信息语言模型。

Unsupervised evolution of protein and antibody complexes with a structure-informed language model.

机构信息

Stanford Biophysics Program, Stanford University School of Medicine, Stanford, CA 94305, USA.

Stanford Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA 94305, USA.

出版信息

Science. 2024 Jul 5;385(6704):46-53. doi: 10.1126/science.adk8946. Epub 2024 Jul 4.

Abstract

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.

摘要

仅基于序列信息训练的大型语言模型可以学习蛋白质设计的高级原理。然而,除了序列之外,蛋白质的三维结构决定了它们的特定功能、活性和可进化性。在这里,我们表明,一个经过蛋白质结构骨干坐标增强的通用蛋白质语言模型可以在不需要对单个功能任务进行建模的情况下指导多种蛋白质的进化。我们还证明,仅在单链结构上训练的 ESM-IF1 可以扩展到蛋白质复合物的工程设计中。使用这种方法,我们筛选了两种用于治疗严重急性呼吸系统综合征冠状病毒 2 (SARS-CoV-2)感染的治疗性临床抗体的约 30 种变体。我们分别使中和活性提高了 25 倍,对 BQ.1.1 和 XBB.1.5 等具有抗体逃逸能力的病毒变异体的亲和力提高了 37 倍。这些发现强调了整合结构信息的优势,无需任何特定任务的训练数据,即可识别有效的蛋白质进化轨迹。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索