学习抗体高变区的语言。

Learning the language of antibody hypervariability.

作者信息

Singh Rohit, Im Chiho, Qiu Yu, Mackness Brian, Gupta Abhinav, Joren Taylor, Sledzieski Samuel, Erlach Lena, Wendt Maria, Fomekong Nanfack Yves, Bryson Bryan, Berger Bonnie

机构信息

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Sanofi R&D Large Molecule Research, Cambridge, MA 02141.

出版信息

Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2418918121. doi: 10.1073/pnas.2418918121. Epub 2024 Dec 30.

DOI:10.1073/pnas.2418918121

PMID:39793083

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11725859/

Abstract

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples. Our learned feature representations accurately predict mutational effects on antigen binding, paratope identification, and other key antibody properties. We experimentally validate AbMAP for antibody optimization by applying it to refine a set of antibodies that bind to a SARS-CoV-2 peptide, and obtain an 82% hit-rate and up to 22-fold increase in binding affinity. AbMAP also unlocks large-scale analyses of immune repertoires, revealing that B-cell receptor repertoires of individuals, while remarkably different in sequence, converge toward similar structural and functional coverage. Importantly, AbMAP's transfer learning approach can be readily adapted to advances in foundational PLMs. We anticipate AbMAP will accelerate the efficient design and modeling of antibodies, expedite the discovery of antibody-based therapeutics, and deepen our understanding of humoral immunity.

摘要

蛋白质语言模型（PLMs）在蛋白质建模方面已取得了令人瞩目的成功。然而，通用的“基础”PLMs在抗体建模方面表现有限，因为抗体的高变区不符合此类模型所依赖的进化保守原则。在本研究中，我们提出了一种名为抗体诱变增强处理（AbMAP）的迁移学习框架，该框架通过对抗体结构和结合特异性示例进行监督，对基础模型进行微调，以处理抗体序列输入。我们学习到的特征表示能够准确预测对抗原结合、互补决定区识别及其他关键抗体特性的突变效应。我们通过将AbMAP应用于优化一组与严重急性呼吸综合征冠状病毒2（SARS-CoV-2）肽结合的抗体，对其进行了实验验证，获得了82%的命中率，且结合亲和力提高了22倍。AbMAP还开启了对免疫库的大规模分析，揭示了个体的B细胞受体库虽然在序列上有显著差异，但在结构和功能覆盖方面趋于相似。重要的是，AbMAP的迁移学习方法可以很容易地适应基础PLMs的进展。我们预计AbMAP将加速抗体的高效设计和建模，加快基于抗体的治疗药物的发现，并加深我们对体液免疫的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fbe/11725859/aa5b35bd5348/pnas.2418918121fig01.jpg

相似文献

Learning the language of antibody hypervariability.学习抗体高变区的语言。

Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2418918121. doi: 10.1073/pnas.2418918121. Epub 2024 Dec 30.

Comprehensive characterization of the antibody responses to SARS-CoV-2 Spike protein finds additional vaccine-induced epitopes beyond those for mild infection.全面描述了针对 SARS-CoV-2 刺突蛋白的抗体反应，发现了除轻度感染诱导的表位之外的其他疫苗诱导的表位。

Elife. 2022 Jan 24;11:e73490. doi: 10.7554/eLife.73490.

Structural Basis of a Human Neutralizing Antibody Specific to the SARS-CoV-2 Spike Protein Receptor-Binding Domain.人类针对 SARS-CoV-2 刺突蛋白受体结合域的中和抗体的结构基础。

Microbiol Spectr. 2021 Oct 31;9(2):e0135221. doi: 10.1128/Spectrum.01352-21. Epub 2021 Oct 13.

De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model.从头生成 SARS-CoV-2 抗体 CDRH3 与预训练生成式大型语言模型。

Nat Commun. 2024 Aug 10;15(1):6867. doi: 10.1038/s41467-024-50903-y.

AlphaFold2 Modeling and Molecular Dynamics Simulations of the Conformational Ensembles for the SARS-CoV-2 Spike Omicron JN.1, KP.2 and KP.3 Variants: Mutational Profiling of Binding Energetics Reveals Epistatic Drivers of the ACE2 Affinity and Escape Hotspots of Antibody Resistance.AlphaFold2 对 SARS-CoV-2 刺突奥密克戎 JN.1、KP.2 和 KP.3 变体构象集合的建模和分子动力学模拟：结合能突变分析揭示 ACE2 亲和力的上位驱动因素和抗体耐药性逃逸热点。

Viruses. 2024 Sep 13;16(9):1458. doi: 10.3390/v16091458.

AntiFormer: graph enhanced large language model for binding affinity prediction.AntiFormer：用于结合亲和力预测的图增强大型语言模型。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae403.

Analysis of B Cell Receptor Repertoires Reveals Key Signatures of the Systemic B Cell Response after SARS-CoV-2 Infection.分析 B 细胞受体文库揭示了 SARS-CoV-2 感染后系统性 B 细胞反应的关键特征。

J Virol. 2022 Feb 23;96(4):e0160021. doi: 10.1128/JVI.01600-21. Epub 2021 Dec 8.

Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants.通过 SARS-CoV-2 刺突蛋白变体逃避中和抗体。

Elife. 2020 Oct 28;9:e61312. doi: 10.7554/eLife.61312.

Rapid discovery of diverse neutralizing SARS-CoV-2 antibodies from large-scale synthetic phage libraries.从大规模合成噬菌体文库中快速发现多样化的中和 SARS-CoV-2 抗体。

MAbs. 2022 Jan-Dec;14(1):2002236. doi: 10.1080/19420862.2021.2002236.

Unsupervised evolution of protein and antibody complexes with a structure-informed language model.无监督的蛋白质和抗体复合物的进化与结构信息语言模型。

Science. 2024 Jul 5;385(6704):46-53. doi: 10.1126/science.adk8946. Epub 2024 Jul 4.

引用本文的文献

Artificial intelligence in antibody design and development: harnessing the power of computational approaches.人工智能在抗体设计与开发中的应用：利用计算方法的力量

Med Biol Eng Comput. 2025 Sep 1. doi: 10.1007/s11517-025-03429-4.

SALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning.SALM：用于全面抗体表征学习的序列-结构预训练大语言模型。

Research (Wash D C). 2025 Aug 19;8:0721. doi: 10.34133/research.0721. eCollection 2025.

Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution.蛋白质语言模型伪似然性捕捉体内B细胞选择和进化的特征。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf418.

Nucleotide context models outperform protein language models for predicting antibody affinity maturation.在预测抗体亲和力成熟方面，核苷酸上下文模型优于蛋白质语言模型。

bioRxiv. 2025 Jun 18:2025.06.16.659977. doi: 10.1101/2025.06.16.659977.

Aggregating residue-level protein language model embeddings with optimal transport.通过最优传输聚合残基水平的蛋白质语言模型嵌入

Bioinform Adv. 2025 Mar 20;5(1):vbaf060. doi: 10.1093/bioadv/vbaf060. eCollection 2025.

Learning the language of protein-protein interactions.学习蛋白质-蛋白质相互作用的语言。

bioRxiv. 2025 Mar 18:2025.03.09.642188. doi: 10.1101/2025.03.09.642188.

本文引用的文献

One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data.一对多注意力机制：用于生物医学数据的可扩展多模态集成

Pac Symp Biocomput. 2025;30:580-593. doi: 10.1142/9789819807024_0041.

De novo design of protein structure and function with RFdiffusion.利用 RFdiffusion 从头设计蛋白质结构和功能。

Nature. 2023 Aug;620(7976):1089-1100. doi: 10.1038/s41586-023-06415-8. Epub 2023 Jul 11.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

AbLang: an antibody language model for completing antibody sequences.AbLang：一种用于完成抗体序列的抗体语言模型。

Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.

A dataset comprised of binding interactions for 104,972 antibodies against a SARS-CoV-2 peptide.一个包含 104972 种针对 SARS-CoV-2 肽的抗体结合相互作用的数据集。

Sci Data. 2022 Oct 26;9(1):653. doi: 10.1038/s41597-022-01779-4.

Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。

Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.

Deep learning guided optimization of human antibody against SARS-CoV-2 variants with broad neutralization.深度学习指导的广谱中和抗体对 SARS-CoV-2 变体的优化。

Proc Natl Acad Sci U S A. 2022 Mar 15;119(11):e2122954119. doi: 10.1073/pnas.2122954119. Epub 2022 Mar 1.

Learning protein fitness models from evolutionary and assay-labeled data.从进化和实验标记数据中学习蛋白质适应性模型。

Nat Biotechnol. 2022 Jul;40(7):1114-1122. doi: 10.1038/s41587-021-01146-5. Epub 2022 Jan 17.

Current strategies for detecting functional convergence across B-cell receptor repertoires.当前检测 B 细胞受体库中功能趋同的策略。

MAbs. 2021 Jan-Dec;13(1):1996732. doi: 10.1080/19420862.2021.1996732.

Learning the protein language: Evolution, structure, and function.学习蛋白质语言：进化、结构和功能。

Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

学习抗体高变区的语言。

Learning the language of antibody hypervariability.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献