Suppr超能文献

利用自监督学习破解抗体语言。

Deciphering the language of antibodies using self-supervised learning.

作者信息

Leem Jinwoo, Mitchell Laura S, Farmery James H R, Barton Justin, Galson Jacob D

机构信息

Alchemab Therapeutics, Ltd., East Side, Office 1.02, Kings Cross, London N1C 4AX, UK.

出版信息

Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.

Abstract

An individual's B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies.

摘要

个体的B细胞受体(BCR)库编码了有关过去免疫反应的信息以及未来疾病保护的潜力。解读存储在BCR序列数据集中的信息将改变我们对疾病的理解,并有助于发现新的诊断方法和抗体疗法。BCR序列分析的一个关键挑战是仅从其氨基酸序列预测BCR的特性。在此,我们提出了一种抗体特异性语言模型,即来自Transformer的抗体特异性双向编码器表示(AntiBERTa),它提供了BCR序列的上下文表示。经过预训练后,我们表明AntiBERTa嵌入捕获了生物学相关信息,可推广到一系列应用。作为一个案例研究,我们对AntiBERTa进行微调,以从抗体序列预测互补决定区(CDR)位置,在多个指标上优于公共工具。据我们所知,AntiBERTa是最深的蛋白质家族特异性语言模型,提供了丰富的BCR表示。AntiBERTa嵌入可用于多个下游任务,并能增进我们对抗体语言的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dca6/9278498/34d38f15d6ce/fx1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验