通过带有EVOLVEpro的蛋白质语言模型进行快速计算机辅助定向进化。

Rapid in silico directed evolution by a protein language model with EVOLVEpro.

作者信息

Jiang Kaiyi, Yan Zhaoqing, Di Bernardo Matteo, Sgrizzi Samantha R, Villiger Lukas, Kayabolen Alisan, Kim B J, Carscadden Josephine K, Hiraizumi Masahiro, Nishimasu Hiroshi, Gootenberg Jonathan S, Abudayyeh Omar O

机构信息

Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA.

Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA.

出版信息

Science. 2025 Jan 24;387(6732):eadr6006. doi: 10.1126/science.adr6006.

DOI:10.1126/science.adr6006

PMID:39571002

Abstract

Directed protein evolution is central to biomedical applications but faces challenges such as experimental complexity, inefficient multiproperty optimization, and local maxima traps. Although in silico methods that use protein language models (PLMs) can provide modeled fitness landscape guidance, they struggle to generalize across diverse protein families and map to protein activity. We present EVOLVEpro, a few-shot active learning framework that combines PLMs and regression models to rapidly improve protein activity. EVOLVEpro surpasses current methods, yielding up to 100-fold improvements in desired properties. We demonstrate its effectiveness across six proteins in RNA production, genome editing, and antibody binding applications. These results highlight the advantages of few-shot active learning with minimal experimental data over zero-shot predictions. EVOLVEpro opens new possibilities for artificial intelligence-guided protein engineering in biology and medicine.

摘要

定向蛋白质进化是生物医学应用的核心，但面临着诸如实验复杂性、多属性优化效率低下和局部最大值陷阱等挑战。尽管使用蛋白质语言模型（PLM）的计算机方法可以提供建模的适应度景观指导，但它们难以在不同蛋白质家族中进行泛化并映射到蛋白质活性。我们提出了EVOLVEpro，这是一种少样本主动学习框架，它结合了PLM和回归模型以快速提高蛋白质活性。EVOLVEpro超越了当前方法，在所需属性上实现了高达100倍的提升。我们在RNA生产、基因组编辑和抗体结合应用中的六种蛋白质上证明了其有效性。这些结果突出了在极少实验数据的情况下进行少样本主动学习相对于零样本预测的优势。EVOLVEpro为生物学和医学中人工智能引导的蛋白质工程开辟了新的可能性。