Holt Clinton M, Janke Alexis K, Amlashi Parastoo, Jamieson Parker J, Marinov Toma M, Georgiev Ivelin S
Vanderbilt Center for Antibody Therapeutics, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
Program in Chemical and Physical Biology, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
bioRxiv. 2025 Apr 1:2025.02.25.640114. doi: 10.1101/2025.02.25.640114.
Computational epitope prediction remains an unmet need for therapeutic antibody development. We present three complementary approaches for predicting epitope relationships from antibody amino acid sequences. First, we analyze ~18 million antibody pairs targeting ~250 protein families and establish that a threshold of >70% CDRH3 sequence identity among antibodies sharing both heavy and light chain V-genes reliably predicts overlapping-epitope antibody pairs. Next, we develop a supervised contrastive fine-tuning framework for antibody large language models which results in embeddings that better correlate with epitope information than those from pretrained models. Applying this contrastive learning approach to SARS-CoV-2 receptor binding domain antibodies, we achieve 82.7% balanced accuracy in distinguishing same-epitope versus different-epitope antibody pairs and demonstrate the ability to predict relative levels of structural overlap from learning on functional epitope bins (Spearman = 0.25). Finally, we create AbLang-PDB, a generalized model for predicting overlapping-epitope antibodies for a broad range of protein families. AbLang-PDB achieves five-fold improvement in average precision for predicting overlapping-epitope antibody pairs compared to sequence-based methods, and effectively predicts the amount of epitope overlap among overlapping-epitope pairs ( = 0.81). In an antibody discovery campaign searching for overlapping-epitope antibodies to the HIV-1 broadly neutralizing antibody 8ANC195, 70% of computationally selected candidates demonstrated HIV-1 specificity, with 50% showing competitive binding with 8ANC195. Together, the computational models presented here provide powerful tools for epitope-targeted antibody discovery, while demonstrating the efficacy of contrastive learning for improving epitope-representation.
在治疗性抗体开发中,计算性表位预测仍是一项尚未满足的需求。我们提出了三种互补方法,用于从抗体氨基酸序列预测表位关系。首先,我们分析了针对约250个蛋白质家族的约1800万对抗体,并确定在共享重链和轻链V基因的抗体中,CDRH3序列同一性>70%的阈值可可靠地预测重叠表位抗体对。其次,我们为抗体大语言模型开发了一个有监督的对比微调框架,该框架生成的嵌入与表位信息的相关性比预训练模型更好。将这种对比学习方法应用于SARS-CoV-2受体结合域抗体,我们在区分同表位与不同表位抗体对方面达到了82.7%的平衡准确率,并证明了从功能表位分类学习中预测结构重叠相对水平的能力(斯皮尔曼相关系数=0.25)。最后,我们创建了AbLang-PDB,这是一个用于预测广泛蛋白质家族重叠表位抗体的通用模型。与基于序列的方法相比,AbLang-PDB在预测重叠表位抗体对的平均精度上提高了五倍,并有效地预测了重叠表位对之间的表位重叠量(相关系数=0.81)。在一项寻找与HIV-1广泛中和抗体8ANC195重叠表位抗体的抗体发现活动中,70%通过计算选择的候选抗体表现出HIV-1特异性,其中50%与8ANC195表现出竞争性结合。总之,本文提出的计算模型为靶向表位的抗体发现提供了强大工具,同时证明了对比学习在改善表位表征方面的有效性。