• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

解决抗体种系偏倚及其对改善抗体设计的语言模型的影响。

Addressing the antibody germline bias and its effect on language models for improved antibody design.

机构信息

Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom.

GSK Medicines Research Centre, GSK, Stevenage SG1 2NY, United Kingdom.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae618.

DOI:10.1093/bioinformatics/btae618
PMID:39460949
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11543624/
Abstract

MOTIVATION

The versatile binding properties of antibodies have made them an extremely important class of biotherapeutics. However, therapeutic antibody development is a complex, expensive, and time-consuming task, with the final antibody needing to not only have strong and specific binding but also be minimally impacted by developability issues. The success of transformer-based language models in protein sequence space and the availability of vast amounts of antibody sequences, has led to the development of many antibody-specific language models to help guide antibody design. Antibody diversity primarily arises from V(D)J recombination, mutations within the CDRs, and/or from a few nongermline mutations outside the CDRs. Consequently, a significant portion of the variable domain of all natural antibody sequences remains germline. This affects the pre-training of antibody-specific language models, where this facet of the sequence data introduces a prevailing bias toward germline residues. This poses a challenge, as mutations away from the germline are often vital for generating specific and potent binding to a target, meaning that language models need be able to suggest key mutations away from germline.

RESULTS

In this study, we explore the implications of the germline bias, examining its impact on both general-protein and antibody-specific language models. We develop and train a series of new antibody-specific language models optimized for predicting nongermline residues. We then compare our final model, AbLang-2, with current models and show how it suggests a diverse set of valid mutations with high cumulative probability.

AVAILABILITY AND IMPLEMENTATION

AbLang-2 is trained on both unpaired and paired data, and is freely available at https://github.com/oxpig/AbLang2.git.

摘要

动机

抗体的多功能结合特性使其成为一类极其重要的生物治疗药物。然而,治疗性抗体的开发是一项复杂、昂贵且耗时的任务,最终的抗体不仅需要具有强大和特异性的结合,还需要最小化可开发性问题的影响。基于变压器的语言模型在蛋白质序列空间中的成功以及大量抗体序列的可用性,促使开发了许多针对抗体的语言模型来帮助指导抗体设计。抗体多样性主要源于 V(D)J 重组、CDR 内的突变,和/或 CDR 外的少数非种系突变。因此,所有天然抗体序列的可变区很大一部分仍然是种系。这会影响针对抗体的语言模型的预训练,其中序列数据的这一方面会导致对种系残基的普遍偏见。这带来了一个挑战,因为远离种系的突变对于产生针对目标的特异性和强效结合通常是至关重要的,这意味着语言模型需要能够提出远离种系的关键突变。

结果

在这项研究中,我们探讨了种系偏差的影响,研究了其对一般蛋白质和抗体特异性语言模型的影响。我们开发并训练了一系列针对预测非种系残基的新的抗体特异性语言模型。然后,我们将我们的最终模型 AbLang-2 与当前模型进行比较,并展示了它如何建议具有高累积概率的多样化有效突变。

可用性和实现

AbLang-2 是在未配对和配对数据上进行训练的,并可在 https://github.com/oxpig/AbLang2.git 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/8933177bb9fc/btae618f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/6cf9965b0ad1/btae618f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/f3183f5989b8/btae618f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/8933177bb9fc/btae618f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/6cf9965b0ad1/btae618f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/f3183f5989b8/btae618f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/757c/11543624/8933177bb9fc/btae618f3.jpg

相似文献

1
Addressing the antibody germline bias and its effect on language models for improved antibody design.解决抗体种系偏倚及其对改善抗体设计的语言模型的影响。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae618.
2
AbLang: an antibody language model for completing antibody sequences.AbLang:一种用于完成抗体序列的抗体语言模型。
Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.
3
A germline knowledge based computational approach for determining antibody complementarity determining regions.基于胚系知识的计算方法,用于确定抗体互补决定区。
Mol Immunol. 2010 Jan;47(4):694-700. doi: 10.1016/j.molimm.2009.10.028. Epub 2009 Nov 24.
4
AbLEF: antibody language ensemble fusion for thermodynamically empowered property predictions.AbLEF:基于抗体语言集成融合的热力学赋能性质预测。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae268.
5
A bioinformatics pipeline to build a knowledge database for in silico antibody engineering.生物信息学管道构建用于计算机抗体工程的知识库。
Mol Immunol. 2011 Apr;48(8):1019-26. doi: 10.1016/j.molimm.2011.01.009.
6
Pre-training with a rational approach for antibody sequence representation.基于理性方法的抗体序列表示的预训练。
Front Immunol. 2024 Oct 23;15:1468599. doi: 10.3389/fimmu.2024.1468599. eCollection 2024.
7
Human germline antibody gene segments encode polyspecific antibodies.人类种系抗体基因片段编码多特异性抗体。
PLoS Comput Biol. 2013 Apr;9(4):e1003045. doi: 10.1371/journal.pcbi.1003045. Epub 2013 Apr 25.
8
p-IgGen: a paired antibody generative language model.p-IgGen:一种配对抗体生成语言模型。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae659.
9
For antibody sequence generative modeling, mixture models may be all you need.对于抗体序列生成建模,混合模型可能就是你所需要的。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae278.
10
RosettaAntibodyDesign (RAbD): A general framework for computational antibody design.罗塞塔抗体设计(RAbD):一种通用的计算抗体设计框架。
PLoS Comput Biol. 2018 Apr 27;14(4):e1006112. doi: 10.1371/journal.pcbi.1006112. eCollection 2018 Apr.

引用本文的文献

1
Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution.蛋白质语言模型伪似然性捕捉体内B细胞选择和进化的特征。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf418.
2
Functional and epitope specific monoclonal antibody discovery directly from immune sera using cryo-EM.利用冷冻电镜直接从免疫血清中发现功能性和表位特异性单克隆抗体。
Sci Adv. 2025 Aug 15;11(33):eadv8257. doi: 10.1126/sciadv.adv8257.
3
A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder.

本文引用的文献

1
Improving antibody language models with native pairing.通过天然配对改进抗体语言模型。
Patterns (N Y). 2024 Apr 4;5(5):100967. doi: 10.1016/j.patter.2024.100967. eCollection 2024 May 10.
2
ProGen2: Exploring the boundaries of protein language models.ProGen2:探索蛋白质语言模型的边界。
Cell Syst. 2023 Nov 15;14(11):968-978.e3. doi: 10.1016/j.cels.2023.10.002. Epub 2023 Oct 30.
3
Efficient evolution of human antibodies from general protein language models.从通用蛋白质语言模型中高效进化出人类抗体。
一种通过Transformer编码器对个体抗体进行自然选择的位点特异性模型。
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf186.
4
Nucleotide context models outperform protein language models for predicting antibody affinity maturation.在预测抗体亲和力成熟方面,核苷酸上下文模型优于蛋白质语言模型。
bioRxiv. 2025 Jun 18:2025.06.16.659977. doi: 10.1101/2025.06.16.659977.
5
Tuning antibody stability and function by rational designs of framework mutations.通过对框架突变进行合理设计来调节抗体稳定性和功能。
MAbs. 2025 Dec;17(1):2532117. doi: 10.1080/19420862.2025.2532117. Epub 2025 Jul 13.
6
Focused learning by antibody language models using preferential masking of non-templated regions.通过对非模板化区域进行优先掩码处理,利用抗体语言模型进行聚焦学习。
Patterns (N Y). 2025 Apr 25;6(6):101239. doi: 10.1016/j.patter.2025.101239. eCollection 2025 Jun 13.
7
Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction.预训练抗体语言模型的监督微调可提高抗原特异性预测能力。
PLoS Comput Biol. 2025 Mar 31;21(3):e1012153. doi: 10.1371/journal.pcbi.1012153. eCollection 2025 Mar.
8
A curriculum learning approach to training antibody language models.一种用于训练抗体语言模型的课程学习方法。
bioRxiv. 2025 Mar 2:2025.02.27.640641. doi: 10.1101/2025.02.27.640641.
9
Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery.对比学习助力靶向抗体发现中的表位重叠预测。
bioRxiv. 2025 Apr 1:2025.02.25.640114. doi: 10.1101/2025.02.25.640114.
10
Self-supervised machine learning methods for protein design improve sampling but not the identification of high-fitness variants.用于蛋白质设计的自监督机器学习方法可改善采样,但无法识别高适应性变体。
Sci Adv. 2025 Feb 14;11(7):eadr7338. doi: 10.1126/sciadv.adr7338. Epub 2025 Feb 12.
Nat Biotechnol. 2024 Feb;42(2):275-283. doi: 10.1038/s41587-023-01763-2. Epub 2023 Apr 24.
4
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
5
AbLang: an antibody language model for completing antibody sequences.AbLang:一种用于完成抗体序列的抗体语言模型。
Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.
6
Antibodies to watch in 2023.2023 年值得关注的抗体药物
MAbs. 2023 Jan-Dec;15(1):2153410. doi: 10.1080/19420862.2022.2153410.
7
Functional antibodies exhibit light chain coherence.功能性抗体表现出轻链一致性。
Nature. 2022 Nov;611(7935):352-357. doi: 10.1038/s41586-022-05371-z. Epub 2022 Oct 26.
8
ProtGPT2 is a deep unsupervised language model for protein design.ProtGPT2 是一个用于蛋白质设计的深度无监督语言模型。
Nat Commun. 2022 Jul 27;13(1):4348. doi: 10.1038/s41467-022-32007-7.
9
Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。
Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.
10
BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning.BioPhi:一个基于天然抗体库和深度学习的抗体设计、人源化和人源评估平台。
MAbs. 2022 Jan-Dec;14(1):2020203. doi: 10.1080/19420862.2021.2020203.