Suppr超能文献

利用自监督学习破解抗体语言。

Deciphering the language of antibodies using self-supervised learning.

作者信息

Leem Jinwoo, Mitchell Laura S, Farmery James H R, Barton Justin, Galson Jacob D

机构信息

Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK.

出版信息

Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.

Abstract

An individual's B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies.

摘要

个体的B细胞受体(BCR)库编码了有关过去免疫反应的信息以及未来疾病保护的潜力。解读存储在BCR序列数据集中的信息将改变我们对疾病的理解,并有助于发现新的诊断方法和抗体疗法。BCR序列分析的一个关键挑战是仅从其氨基酸序列预测BCR的特性。在此,我们提出了一种抗体特异性语言模型,即来自Transformer的抗体特异性双向编码器表示(AntiBERTa),它提供了BCR序列的上下文表示。经过预训练后,我们表明AntiBERTa嵌入捕获了生物学相关信息,可推广到一系列应用。作为一个案例研究,我们对AntiBERTa进行微调,以从抗体序列预测互补决定区(CDR)位置,在多个指标上优于公共工具。据我们所知,AntiBERTa是最深的蛋白质家族特异性语言模型,提供了丰富的BCR表示。AntiBERTa嵌入可用于多个下游任务,并能增进我们对抗体语言的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/34d38f15d6ce/fx1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验