Suppr超能文献

蛋白质CREATE技术可实现合成蛋白质结合剂的闭环设计。

Protein CREATE enables closed-loop design of synthetic protein binders.

作者信息

Lourenço Alec, Subramanian Arjuna, Spencer Ryan, Anaya Michael, Miao Jiapei, Fu William, Chow Eric, Thomson Matt

机构信息

Caltech.

Eli Lilly.

出版信息

bioRxiv. 2025 Jan 2:2024.12.20.629847. doi: 10.1101/2024.12.20.629847.

Abstract

Proteins have proven to be useful agents in a variety of fields, from serving as potent therapeutics to enabling complex catalysis for chemical manufacture. However, they remain difficult to design and are instead typically selected for using extensive screens or directed evolution. Recent developments in protein large language models have enabled fast generation of diverse protein sequences in unexplored regions of protein space predicted to fold into varied structures, bind relevant targets, and catalyze novel reactions. Nevertheless, we lack methods to characterize these proteins experimentally at scale and update generative models based on those results. We describe Protein CREATE (Computational Redesign via an Experiment-Augmented Training Engine), an integrated computational and experimental pipeline that incorporates an experimental workflow leveraging next generation sequencing and phage display with single-molecule readouts to collect vast amounts of quantitative binding data for updating protein large language models. We use Protein CREATE to generate and assay thousands of designed binders to IL-7 receptor and insulin receptor with parallel positive and negative selections to identify on-target binders. We discover not only individual novel binders but also features of ligand-receptor binding, including preservation of the IL7R - ligand hydrophobic interface specifically and existence of multiple approaches to contact the insulin receptor. We also demonstrate the importance of structural features, such as the lack of unpaired cysteine residues, toward design fidelity and find computational pre-screening metrics, such as interchain predicted TM scoring (iPTM), while useful, are imperfect predictors as they neither guarantee experimental binding nor rule it out. We use the data collected from Protein CREATE to score designs from the initial generative models. Globally, Protein CREATE will power future closed-loop design-build-test cycles to enable fine-grained design of protein binders.

摘要

蛋白质已被证明在各个领域都是有用的物质,从作为强效治疗剂到实现化学制造中的复杂催化。然而,蛋白质仍然难以设计,相反,通常通过广泛的筛选或定向进化来选择。蛋白质大语言模型的最新进展使得能够在蛋白质空间的未探索区域快速生成多样的蛋白质序列,这些区域预计会折叠成不同的结构、结合相关靶点并催化新反应。尽管如此,我们缺乏大规模实验表征这些蛋白质并根据这些结果更新生成模型的方法。我们描述了Protein CREATE(通过实验增强训练引擎进行计算重新设计),这是一个集成的计算和实验流程,它结合了一个实验工作流程,利用下一代测序和具有单分子读数的噬菌体展示来收集大量定量结合数据,以更新蛋白质大语言模型。我们使用Protein CREATE生成并检测数千种针对白细胞介素-7受体和胰岛素受体设计的结合剂,并通过平行的阳性和阴性选择来识别靶向结合剂。我们不仅发现了单个新型结合剂,还发现了配体-受体结合的特征,包括特异性保留白细胞介素-7受体-配体疏水界面以及存在多种接触胰岛素受体的方式。我们还证明了结构特征(如缺乏未配对的半胱氨酸残基)对设计保真度的重要性,并发现计算预筛选指标(如链间预测跨膜评分(iPTM))虽然有用,但却是不完美的预测指标,因为它们既不能保证实验结合,也不能排除实验结合。我们使用从Protein CREATE收集的数据对初始生成模型的设计进行评分。总体而言,Protein CREATE将推动未来的闭环设计-构建-测试循环,以实现蛋白质结合剂细粒度的设计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4af/11722223/3ea378f12b2c/nihpp-2024.12.20.629847v2-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验