• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于理性方法的抗体序列表示的预训练。

Pre-training with a rational approach for antibody sequence representation.

机构信息

XtalPi Innovation Center, XtalPi Inc., Beijing, China.

School of Medical Technology, Beijing Institute of Technology, Beijing, China.

出版信息

Front Immunol. 2024 Oct 23;15:1468599. doi: 10.3389/fimmu.2024.1468599. eCollection 2024.

DOI:10.3389/fimmu.2024.1468599
PMID:39507535
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11537868/
Abstract

INTRODUCTION

Antibodies represent a specific class of proteins produced by the adaptive immune system in response to pathogens. Mining the information embedded in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. However, antibodies possess unique features that should be incorporated using specifically designed training methods, leaving room for improvement in pre-training models for antibody sequences.

METHODS

In this study, we present a Pre-trained model of Antibody sequences trained with a Rational Approach for antibodies (PARA). PARA employs a strategy conforming to antibody sequence patterns and an advanced natural language processing self-encoding model structure. This approach addresses the limitations of existing protein pre-training models, which primarily utilize language models without fully considering the differences between protein sequences and language sequences.

RESULTS

We demonstrate PARA's performance on several tasks by comparing it to various published pre-training models of antibodies. The results show that PARA significantly outperforms existing models on these tasks, suggesting that PARA has an advantage in capturing antibody sequence information.

DISCUSSION

The antibody latent representation provided by PARA can substantially facilitate studies in relevant areas. We believe that PARA's superior performance in capturing antibody sequence information offers significant potential for both antibody property prediction and the development of novel therapeutics. PARA is available at https://github.com/xtalpi-xic.

摘要

简介

抗体是适应性免疫系统针对病原体产生的一种特殊蛋白质。挖掘抗体氨基酸序列中所蕴含的信息,有助于预测抗体特性和开发新型治疗方法。然而,抗体具有独特的特征,需要使用专门设计的训练方法进行整合,这为抗体序列的预训练模型留下了改进的空间。

方法

本研究提出了一种基于理性方法的抗体序列预训练模型(PARA)。PARA 采用了一种符合抗体序列模式的策略和先进的自然语言处理自编码模型结构。这种方法解决了现有蛋白质预训练模型的局限性,这些模型主要使用语言模型,而没有充分考虑蛋白质序列和语言序列之间的差异。

结果

我们通过将 PARA 与各种已发表的抗体预训练模型进行比较,展示了 PARA 在多个任务上的性能。结果表明,PARA 在这些任务上明显优于现有模型,这表明 PARA 在捕获抗体序列信息方面具有优势。

讨论

PARA 提供的抗体潜在表示可以极大地促进相关领域的研究。我们相信,PARA 在捕获抗体序列信息方面的优异表现,为抗体特性预测和新型治疗方法的开发提供了巨大的潜力。PARA 可在 https://github.com/xtalpi-xic 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/cc3fd0931751/fimmu-15-1468599-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/4e6181e11f37/fimmu-15-1468599-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/02f24727fa16/fimmu-15-1468599-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/dc12d7fefc82/fimmu-15-1468599-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/609f0bbcba2c/fimmu-15-1468599-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/57ba32741bed/fimmu-15-1468599-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/4eda849ddd79/fimmu-15-1468599-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/cc3fd0931751/fimmu-15-1468599-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/4e6181e11f37/fimmu-15-1468599-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/02f24727fa16/fimmu-15-1468599-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/dc12d7fefc82/fimmu-15-1468599-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/609f0bbcba2c/fimmu-15-1468599-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/57ba32741bed/fimmu-15-1468599-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/4eda849ddd79/fimmu-15-1468599-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf56/11537868/cc3fd0931751/fimmu-15-1468599-g007.jpg

相似文献

1
Pre-training with a rational approach for antibody sequence representation.基于理性方法的抗体序列表示的预训练。
Front Immunol. 2024 Oct 23;15:1468599. doi: 10.3389/fimmu.2024.1468599. eCollection 2024.
2
Accurate prediction of antibody function and structure using bio-inspired antibody language model.使用仿生抗体语言模型准确预测抗体功能和结构。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae245.
3
BERT2DAb: a pre-trained model for antibody representation based on amino acid sequences and 2D-structure.BERT2DAb:基于氨基酸序列和 2D 结构的抗体表示预训练模型。
MAbs. 2023 Jan-Dec;15(1):2285904. doi: 10.1080/19420862.2023.2285904. Epub 2023 Nov 27.
4
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
5
NanoBERTa-ASP: predicting nanobody paratope based on a pretrained RoBERTa model.NanoBERTa-ASP:基于预训练的 RoBERTa 模型预测纳米抗体表位。
BMC Bioinformatics. 2024 Mar 21;25(1):122. doi: 10.1186/s12859-024-05750-5.
6
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
7
AbLEF: antibody language ensemble fusion for thermodynamically empowered property predictions.AbLEF:基于抗体语言集成融合的热力学赋能性质预测。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae268.
8
Addressing the antibody germline bias and its effect on language models for improved antibody design.解决抗体种系偏倚及其对改善抗体设计的语言模型的影响。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae618.
9
Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.用于深度蛋白质组学和基因组学的生物序列连续分布式表示
PLoS One. 2015 Nov 10;10(11):e0141287. doi: 10.1371/journal.pone.0141287. eCollection 2015.
10
MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations.MolLM:一种将生物医学文本与 2D 和 3D 分子表示集成的统一语言模型。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i357-i368. doi: 10.1093/bioinformatics/btae260.

引用本文的文献

1
Artificial intelligence in antibody design and development: harnessing the power of computational approaches.人工智能在抗体设计与开发中的应用:利用计算方法的力量
Med Biol Eng Comput. 2025 Sep 1. doi: 10.1007/s11517-025-03429-4.

本文引用的文献

1
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
2
AbLang: an antibody language model for completing antibody sequences.AbLang:一种用于完成抗体序列的抗体语言模型。
Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.
3
Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。
Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.
4
Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space.使用能够推广到新突变空间的机器学习模型来优化治疗性抗体的亲和力和特异性。
Nat Commun. 2022 Jul 1;13(1):3788. doi: 10.1038/s41467-022-31457-3.
5
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
6
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
7
Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning.通过深度学习从抗体序列预测抗原特异性来优化治疗性抗体。
Nat Biomed Eng. 2021 Jun;5(6):600-612. doi: 10.1038/s41551-021-00699-9. Epub 2021 Apr 15.
8
How repertoire data are changing antibody science.抗体科学如何因库数据而改变。
J Biol Chem. 2020 Jul 17;295(29):9823-9837. doi: 10.1074/jbc.REV120.010181. Epub 2020 May 14.
9
Understanding the Significance and Implications of Antibody Numbering and Antigen-Binding Surface/Residue Definition.理解抗体编号和抗原结合表面/残基定义的意义和影响。
Front Immunol. 2018 Oct 16;9:2278. doi: 10.3389/fimmu.2018.02278. eCollection 2018.
10
Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires.观察到的抗体空间:用于挖掘下一代抗体库测序数据的资源。
J Immunol. 2018 Oct 15;201(8):2502-2509. doi: 10.4049/jimmunol.1800708. Epub 2018 Sep 14.