• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自监督学习破解抗体语言。

Deciphering the language of antibodies using self-supervised learning.

作者信息

Leem Jinwoo, Mitchell Laura S, Farmery James H R, Barton Justin, Galson Jacob D

机构信息

Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK.

出版信息

Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.

DOI:10.1016/j.patter.2022.100513
PMID:35845836
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9278498/
Abstract

An individual's B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies.

摘要

个体的B细胞受体(BCR)库编码了有关过去免疫反应的信息以及未来疾病保护的潜力。解读存储在BCR序列数据集中的信息将改变我们对疾病的理解,并有助于发现新的诊断方法和抗体疗法。BCR序列分析的一个关键挑战是仅从其氨基酸序列预测BCR的特性。在此,我们提出了一种抗体特异性语言模型,即来自Transformer的抗体特异性双向编码器表示(AntiBERTa),它提供了BCR序列的上下文表示。经过预训练后,我们表明AntiBERTa嵌入捕获了生物学相关信息,可推广到一系列应用。作为一个案例研究,我们对AntiBERTa进行微调,以从抗体序列预测互补决定区(CDR)位置,在多个指标上优于公共工具。据我们所知,AntiBERTa是最深的蛋白质家族特异性语言模型,提供了丰富的BCR表示。AntiBERTa嵌入可用于多个下游任务,并能增进我们对抗体语言的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/0e2d99f9477b/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/34d38f15d6ce/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/59309426a0dd/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/5d245c06519d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/ba7508b595ad/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/76b5c212d209/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/0e2d99f9477b/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/34d38f15d6ce/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/59309426a0dd/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/5d245c06519d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/ba7508b595ad/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/76b5c212d209/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/0e2d99f9477b/gr5.jpg

相似文献

1
Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。
Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.
2
Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity.基于语言模型的 B 细胞受体序列嵌入可以有效地编码受体特异性。
Nucleic Acids Res. 2024 Jan 25;52(2):548-557. doi: 10.1093/nar/gkad1128.
3
Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction.预训练抗体语言模型的监督微调可提高抗原特异性预测能力。
bioRxiv. 2024 May 13:2024.05.13.593807. doi: 10.1101/2024.05.13.593807.
4
ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations.ActTRANS:基于迁移学习和上下文表示的主动转运蛋白的功能分类。
Comput Biol Chem. 2021 Aug;93:107537. doi: 10.1016/j.compbiolchem.2021.107537. Epub 2021 Jun 29.
5
The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction.通用情境化蛋白质嵌入在跨物种蛋白质功能预测中的作用
Evol Bioinform Online. 2021 Dec 3;17:11769343211062608. doi: 10.1177/11769343211062608. eCollection 2021.
6
Deep contextualized embeddings for quantifying the informative content in biomedical text summarization.用于量化生物医学文本摘要是信息内容的深度语境化嵌入。
Comput Methods Programs Biomed. 2020 Feb;184:105117. doi: 10.1016/j.cmpb.2019.105117. Epub 2019 Oct 4.
7
When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.当 BERT 遇见比尔博:预训练语言模型在疾病分类上的学习曲线分析。
BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.
8
Model-based clinical note entity recognition for rheumatoid arthritis using bidirectional encoder representation from transformers.基于模型的类风湿性关节炎临床笔记实体识别:使用来自变换器的双向编码器表征
Quant Imaging Med Surg. 2022 Jan;12(1):184-195. doi: 10.21037/qims-21-90.
9
GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.GT-Finder:使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。
Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.
10
An analysis of protein language model embeddings for fold prediction.蛋白质语言模型嵌入物折叠预测分析。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.

引用本文的文献

1
SALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning.SALM:用于全面抗体表征学习的序列-结构预训练大语言模型。
Research (Wash D C). 2025 Aug 19;8:0721. doi: 10.34133/research.0721. eCollection 2025.
2
Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution.蛋白质语言模型伪似然性捕捉体内B细胞选择和进化的特征。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf418.
3
A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder.

本文引用的文献

1
Antibody structure prediction using interpretable deep learning.使用可解释深度学习进行抗体结构预测。
Patterns (N Y). 2021 Dec 9;3(2):100406. doi: 10.1016/j.patter.2021.100406. eCollection 2022 Feb 11.
2
BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning.BioPhi:一个基于天然抗体库和深度学习的抗体设计、人源化和人源评估平台。
MAbs. 2022 Jan-Dec;14(1):2020203. doi: 10.1080/19420862.2021.2020203.
3
ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation.
一种通过Transformer编码器对个体抗体进行自然选择的位点特异性模型。
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf186.
4
Application of artificial intelligence large language models in drug target discovery.人工智能大语言模型在药物靶点发现中的应用。
Front Pharmacol. 2025 Jul 8;16:1597351. doi: 10.3389/fphar.2025.1597351. eCollection 2025.
5
Artificial intelligence-driven computational methods for antibody design and optimization.用于抗体设计与优化的人工智能驱动的计算方法。
MAbs. 2025 Dec;17(1):2528902. doi: 10.1080/19420862.2025.2528902. Epub 2025 Jul 18.
6
BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction.BertADP:用于抗糖尿病肽预测的微调蛋白质语言模型。
BMC Biol. 2025 Jul 15;23(1):210. doi: 10.1186/s12915-025-02312-w.
7
Tuning antibody stability and function by rational designs of framework mutations.通过对框架突变进行合理设计来调节抗体稳定性和功能。
MAbs. 2025 Dec;17(1):2532117. doi: 10.1080/19420862.2025.2532117. Epub 2025 Jul 13.
8
Progress and challenges for the application of machine learning for neglected tropical diseases.机器学习在 neglected tropical diseases 中的应用进展与挑战。 (注:“neglected tropical diseases”直译为“被忽视的热带病” )
F1000Res. 2025 May 20;12:287. doi: 10.12688/f1000research.129064.2. eCollection 2023.
9
Focused learning by antibody language models using preferential masking of non-templated regions.通过对非模板化区域进行优先掩码处理,利用抗体语言模型进行聚焦学习。
Patterns (N Y). 2025 Apr 25;6(6):101239. doi: 10.1016/j.patter.2025.101239. eCollection 2025 Jun 13.
10
Applying computational protein design to therapeutic antibody discovery - current state and perspectives.将计算蛋白质设计应用于治疗性抗体发现——现状与展望。
Front Immunol. 2025 May 22;16:1571371. doi: 10.3389/fimmu.2025.1571371. eCollection 2025.
ABlooper:快速准确的抗体 CDR 环结构预测及其准确性评估。
Bioinformatics. 2022 Mar 28;38(7):1877-1880. doi: 10.1093/bioinformatics/btac016.
4
Epitope profiling using computational structural modelling demonstrated on coronavirus-binding antibodies.利用计算结构建模进行表位分析:以冠状病毒结合抗体为例。
PLoS Comput Biol. 2021 Dec 13;17(12):e1009675. doi: 10.1371/journal.pcbi.1009675. eCollection 2021 Dec.
5
Different B cell subpopulations show distinct patterns in their IgH repertoire metrics.不同 B 细胞亚群的 IgH 受体库指标呈现出不同的模式。
Elife. 2021 Oct 18;10:e73111. doi: 10.7554/eLife.73111.
6
Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences.观察到的抗体空间:一个多样化的数据库,包含经过清理、注释和翻译的未配对和配对抗体序列。
Protein Sci. 2022 Jan;31(1):141-146. doi: 10.1002/pro.4205. Epub 2021 Oct 29.
7
Intrinsic physicochemical profile of marketed antibody-based biotherapeutics.市售抗体类生物治疗药物的固有物理化学特性。
Proc Natl Acad Sci U S A. 2021 Sep 14;118(37). doi: 10.1073/pnas.2020577118.
8
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
9
Tumor-Infiltrating B Lymphocyte Profiling Identifies IgG-Biased, Clonally Expanded Prognostic Phenotypes in Triple-Negative Breast Cancer.肿瘤浸润 B 淋巴细胞分析鉴定三阴性乳腺癌中 IgG 为主、克隆性扩增的预后表型。
Cancer Res. 2021 Aug 15;81(16):4290-4304. doi: 10.1158/0008-5472.CAN-20-3773. Epub 2021 Jun 15.
10
Humanization of antibodies using a machine learning approach on large-scale repertoire data.使用基于大规模库数据的机器学习方法对抗体进行人源化。
Bioinformatics. 2021 Nov 18;37(22):4041-4047. doi: 10.1093/bioinformatics/btab434.