• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过天然配对改进抗体语言模型。

Improving antibody language models with native pairing.

作者信息

Burbach Sarah M, Briney Bryan

机构信息

Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA.

Center for Viral Systems Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.

出版信息

Patterns (N Y). 2024 Apr 4;5(5):100967. doi: 10.1016/j.patter.2024.100967. eCollection 2024 May 10.

DOI:10.1016/j.patter.2024.100967
PMID:38800360
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11117052/
Abstract

Existing antibody language models are limited by their use of unpaired antibody sequence data. A recently published dataset of ∼1.6 × 10 natively paired human antibody sequences offers a unique opportunity to evaluate how antibody language models are improved by training with native pairs. We trained three baseline antibody language models (BALM), using natively paired (BALM-paired), randomly-paired (BALM-shuffled), or unpaired (BALM-unpaired) sequences from this dataset. To address the paucity of paired sequences, we additionally fine-tuned ESM (evolutionary scale modeling)-2 with natively paired antibody sequences (ft-ESM). We provide evidence that training with native pairs allows the model to learn immunologically relevant features that span the light and heavy chains, which cannot be simulated by training with random pairs. We additionally show that training with native pairs improves model performance on a variety of metrics, including the ability of the model to classify antibodies by pathogen specificity.

摘要

现有的抗体语言模型受到其对未配对抗体序列数据使用的限制。最近发布的一个包含约16亿条天然配对人类抗体序列的数据集,为评估抗体语言模型如何通过使用天然配对数据进行训练而得到改进提供了独特机会。我们使用该数据集中的天然配对(BALM-配对)、随机配对(BALM-洗牌)或未配对(BALM-未配对)序列训练了三个基线抗体语言模型(BALM)。为了解决配对序列稀缺的问题,我们还用天然配对抗体序列对ESM(进化尺度建模)-2进行了额外的微调(ft-ESM)。我们提供的证据表明,使用天然配对数据进行训练能使模型学习到跨越轻链和重链的免疫相关特征,而这是使用随机配对数据训练无法模拟的。我们还表明,使用天然配对数据进行训练在各种指标上提高了模型性能,包括模型按病原体特异性对抗体进行分类的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/09f75593669b/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/90967f5d6a75/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/15bd5d4dc721/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/715effb1e3d8/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/fbe6b40501b5/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/09f75593669b/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/90967f5d6a75/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/15bd5d4dc721/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/715effb1e3d8/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/fbe6b40501b5/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ec2/11117052/09f75593669b/gr5.jpg

相似文献

1
Improving antibody language models with native pairing.通过天然配对改进抗体语言模型。
Patterns (N Y). 2024 Apr 4;5(5):100967. doi: 10.1016/j.patter.2024.100967. eCollection 2024 May 10.
2
A natively paired antibody library yields drug leads with higher sensitivity and specificity than a randomly paired antibody library.天然配对的抗体文库比随机配对的抗体文库产生的药物先导物具有更高的灵敏度和特异性。
MAbs. 2018 Apr;10(3):431-443. doi: 10.1080/19420862.2018.1426422. Epub 2018 Feb 1.
3
A curriculum learning approach to training antibody language models.一种用于训练抗体语言模型的课程学习方法。
bioRxiv. 2025 Mar 2:2025.02.27.640641. doi: 10.1101/2025.02.27.640641.
4
Large scale paired antibody language models.大规模配对抗体语言模型。
PLoS Comput Biol. 2024 Dec 6;20(12):e1012646. doi: 10.1371/journal.pcbi.1012646. eCollection 2024 Dec.
5
On the effect of training database size for MR-based synthetic CT generation in the head.基于头部磁共振的合成 CT 生成中训练数据库大小的影响。
Comput Med Imaging Graph. 2023 Jul;107:102227. doi: 10.1016/j.compmedimag.2023.102227. Epub 2023 Apr 26.
6
A Two-Step Golden Gate Cloning Procedure for the Generation of Natively Paired YSD Fab Libraries.一种两步 Golden Gate 克隆程序,用于生成天然配对的 YSD Fab 文库。
Methods Mol Biol. 2023;2681:161-173. doi: 10.1007/978-1-0716-3279-6_10.
7
Combining Rosetta Sequence Design with Protein Language Model Predictions Using Evolutionary Scale Modeling (ESM) as Restraint.结合罗塞塔序列设计与蛋白质语言模型预测,使用进化规模建模(ESM)作为约束条件。
ACS Synth Biol. 2024 Apr 19;13(4):1085-1092. doi: 10.1021/acssynbio.3c00753. Epub 2024 Apr 3.
8
BERT2DAb: a pre-trained model for antibody representation based on amino acid sequences and 2D-structure.BERT2DAb:基于氨基酸序列和 2D 结构的抗体表示预训练模型。
MAbs. 2023 Jan-Dec;15(1):2285904. doi: 10.1080/19420862.2023.2285904. Epub 2023 Nov 27.
9
Pre-training with a rational approach for antibody sequence representation.基于理性方法的抗体序列表示的预训练。
Front Immunol. 2024 Oct 23;15:1468599. doi: 10.3389/fimmu.2024.1468599. eCollection 2024.
10
One-Pot Droplet RT-OE-PCR for the Generation of Natively Paired Antibody Immune Libraries.一锅法液滴 RT-OE-PCR 技术用于生成天然配对抗体免疫文库。
Methods Mol Biol. 2023;2681:213-229. doi: 10.1007/978-1-0716-3279-6_12.

引用本文的文献

1
Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution.蛋白质语言模型伪似然性捕捉体内B细胞选择和进化的特征。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf418.
2
nuTCRacker: Predicting the Recognition of HLA-I-Peptide Complexes by αβTCRs for Unseen Peptides.nuTCRacker:预测αβT细胞受体对未知肽段的HLA-I-肽复合物的识别
Eur J Immunol. 2025 Jul;55(7):e51607. doi: 10.1002/eji.202451607.
3
Focused learning by antibody language models using preferential masking of non-templated regions.

本文引用的文献

1
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies.一种使用经过整理的流感血凝素抗体进行抗体特异性预测的可解释语言模型。
Immunity. 2024 Oct 8;57(10):2453-2465.e7. doi: 10.1016/j.immuni.2024.07.022. Epub 2024 Aug 19.
2
Deep repertoire mining uncovers ultra-broad coronavirus neutralizing antibodies targeting multiple spike epitopes.深度库挖掘揭示了针对多个刺突表位的超广谱冠状病毒中和抗体。
Cell Rep. 2024 Jun 25;43(6):114307. doi: 10.1016/j.celrep.2024.114307. Epub 2024 Jun 5.
3
Evolutionary-scale prediction of atomic-level protein structure with a language model.
通过对非模板化区域进行优先掩码处理,利用抗体语言模型进行聚焦学习。
Patterns (N Y). 2025 Apr 25;6(6):101239. doi: 10.1016/j.patter.2025.101239. eCollection 2025 Jun 13.
4
AMULETY: A Python package to embed adaptive immune receptor sequences.AMULETY:一个用于嵌入适应性免疫受体序列的Python软件包。
bioRxiv. 2025 Mar 25:2025.03.21.644583. doi: 10.1101/2025.03.21.644583.
5
Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction.预训练抗体语言模型的监督微调可提高抗原特异性预测能力。
PLoS Comput Biol. 2025 Mar 31;21(3):e1012153. doi: 10.1371/journal.pcbi.1012153. eCollection 2025 Mar.
6
A curriculum learning approach to training antibody language models.一种用于训练抗体语言模型的课程学习方法。
bioRxiv. 2025 Mar 2:2025.02.27.640641. doi: 10.1101/2025.02.27.640641.
7
Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery.对比学习助力靶向抗体发现中的表位重叠预测。
bioRxiv. 2025 Apr 1:2025.02.25.640114. doi: 10.1101/2025.02.25.640114.
8
Large scale paired antibody language models.大规模配对抗体语言模型。
PLoS Comput Biol. 2024 Dec 6;20(12):e1012646. doi: 10.1371/journal.pcbi.1012646. eCollection 2024 Dec.
9
Focused learning by antibody language models using preferential masking of non-templated regions.通过对非模板化区域进行优先掩蔽,利用抗体语言模型进行聚焦学习。
bioRxiv. 2024 Oct 28:2024.10.23.619908. doi: 10.1101/2024.10.23.619908.
10
Prediction of antibody-antigen interaction based on backbone aware with invariant point attention.基于具有不变点注意力的骨架感知的抗体-抗原相互作用预测。
BMC Bioinformatics. 2024 Nov 6;25(1):348. doi: 10.1186/s12859-024-05961-w.
用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
4
AbLang: an antibody language model for completing antibody sequences.AbLang:一种用于完成抗体序列的抗体语言模型。
Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.
5
Functional antibodies exhibit light chain coherence.功能性抗体表现出轻链一致性。
Nature. 2022 Nov;611(7935):352-357. doi: 10.1038/s41586-022-05371-z. Epub 2022 Oct 26.
6
Artificial intelligence for antibody reading comprehension: AntiBERTa.用于抗体阅读理解的人工智能:AntiBERTa。
Patterns (N Y). 2022 Jul 8;3(7):100535. doi: 10.1016/j.patter.2022.100535.
7
Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。
Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.
8
Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences.观察到的抗体空间:一个多样化的数据库,包含经过清理、注释和翻译的未配对和配对抗体序列。
Protein Sci. 2022 Jan;31(1):141-146. doi: 10.1002/pro.4205. Epub 2021 Oct 29.
9
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
10
Learning the protein language: Evolution, structure, and function.学习蛋白质语言:进化、结构和功能。
Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.