Singh Rohit, Im Chiho, Qiu Yu, Mackness Brian, Gupta Abhinav, Joren Taylor, Sledzieski Samuel, Erlach Lena, Wendt Maria, Fomekong Nanfack Yves, Bryson Bryan, Berger Bonnie
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
Sanofi R&D Large Molecule Research, Cambridge, MA 02141.
Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2418918121. doi: 10.1073/pnas.2418918121. Epub 2024 Dec 30.
Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose "foundational" PLMs have limited performance in modeling antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that such models rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples. Our learned feature representations accurately predict mutational effects on antigen binding, paratope identification, and other key antibody properties. We experimentally validate AbMAP for antibody optimization by applying it to refine a set of antibodies that bind to a SARS-CoV-2 peptide, and obtain an 82% hit-rate and up to 22-fold increase in binding affinity. AbMAP also unlocks large-scale analyses of immune repertoires, revealing that B-cell receptor repertoires of individuals, while remarkably different in sequence, converge toward similar structural and functional coverage. Importantly, AbMAP's transfer learning approach can be readily adapted to advances in foundational PLMs. We anticipate AbMAP will accelerate the efficient design and modeling of antibodies, expedite the discovery of antibody-based therapeutics, and deepen our understanding of humoral immunity.
蛋白质语言模型(PLMs)在蛋白质建模方面已取得了令人瞩目的成功。然而,通用的“基础”PLMs在抗体建模方面表现有限,因为抗体的高变区不符合此类模型所依赖的进化保守原则。在本研究中,我们提出了一种名为抗体诱变增强处理(AbMAP)的迁移学习框架,该框架通过对抗体结构和结合特异性示例进行监督,对基础模型进行微调,以处理抗体序列输入。我们学习到的特征表示能够准确预测对抗原结合、互补决定区识别及其他关键抗体特性的突变效应。我们通过将AbMAP应用于优化一组与严重急性呼吸综合征冠状病毒2(SARS-CoV-2)肽结合的抗体,对其进行了实验验证,获得了82%的命中率,且结合亲和力提高了22倍。AbMAP还开启了对免疫库的大规模分析,揭示了个体的B细胞受体库虽然在序列上有显著差异,但在结构和功能覆盖方面趋于相似。重要的是,AbMAP的迁移学习方法可以很容易地适应基础PLMs的进展。我们预计AbMAP将加速抗体的高效设计和建模,加快基于抗体的治疗药物的发现,并加深我们对体液免疫的理解。