基于理性方法的抗体序列表示的预训练。

Pre-training with a rational approach for antibody sequence representation.

机构信息

XtalPi Innovation Center, XtalPi Inc., Beijing, China.

School of Medical Technology, Beijing Institute of Technology, Beijing, China.

出版信息

Front Immunol. 2024 Oct 23;15:1468599. doi: 10.3389/fimmu.2024.1468599. eCollection 2024.

DOI:10.3389/fimmu.2024.1468599

PMID:39507535

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11537868/

Abstract

INTRODUCTION

Antibodies represent a specific class of proteins produced by the adaptive immune system in response to pathogens. Mining the information embedded in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. However, antibodies possess unique features that should be incorporated using specifically designed training methods, leaving room for improvement in pre-training models for antibody sequences.

METHODS

In this study, we present a Pre-trained model of Antibody sequences trained with a Rational Approach for antibodies (PARA). PARA employs a strategy conforming to antibody sequence patterns and an advanced natural language processing self-encoding model structure. This approach addresses the limitations of existing protein pre-training models, which primarily utilize language models without fully considering the differences between protein sequences and language sequences.

RESULTS

We demonstrate PARA's performance on several tasks by comparing it to various published pre-training models of antibodies. The results show that PARA significantly outperforms existing models on these tasks, suggesting that PARA has an advantage in capturing antibody sequence information.

DISCUSSION

The antibody latent representation provided by PARA can substantially facilitate studies in relevant areas. We believe that PARA's superior performance in capturing antibody sequence information offers significant potential for both antibody property prediction and the development of novel therapeutics. PARA is available at https://github.com/xtalpi-xic.

摘要

简介

抗体是适应性免疫系统针对病原体产生的一种特殊蛋白质。挖掘抗体氨基酸序列中所蕴含的信息，有助于预测抗体特性和开发新型治疗方法。然而，抗体具有独特的特征，需要使用专门设计的训练方法进行整合，这为抗体序列的预训练模型留下了改进的空间。

方法

本研究提出了一种基于理性方法的抗体序列预训练模型（PARA）。PARA 采用了一种符合抗体序列模式的策略和先进的自然语言处理自编码模型结构。这种方法解决了现有蛋白质预训练模型的局限性，这些模型主要使用语言模型，而没有充分考虑蛋白质序列和语言序列之间的差异。

结果

我们通过将 PARA 与各种已发表的抗体预训练模型进行比较，展示了 PARA 在多个任务上的性能。结果表明，PARA 在这些任务上明显优于现有模型，这表明 PARA 在捕获抗体序列信息方面具有优势。

讨论

PARA 提供的抗体潜在表示可以极大地促进相关领域的研究。我们相信，PARA 在捕获抗体序列信息方面的优异表现，为抗体特性预测和新型治疗方法的开发提供了巨大的潜力。PARA 可在 https://github.com/xtalpi-xic 上获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于理性方法的抗体序列表示的预训练。

Pre-training with a rational approach for antibody sequence representation.

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

简介

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

基于理性方法的抗体序列表示的预训练。

Pre-training with a rational approach for antibody sequence representation.

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

简介

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献