Suppr超能文献

BERT2DAb:基于氨基酸序列和 2D 结构的抗体表示预训练模型。

BERT2DAb: a pre-trained model for antibody representation based on amino acid sequences and 2D-structure.

机构信息

Information Center, Academy of Military Medical Sciences, Beijing, China.

State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.

出版信息

MAbs. 2023 Jan-Dec;15(1):2285904. doi: 10.1080/19420862.2023.2285904. Epub 2023 Nov 27.

Abstract

Prior research has generated a vast amount of antibody sequences, which has allowed the pre-training of language models on amino acid sequences to improve the efficiency of antibody screening and optimization. However, compared to those for proteins, there are fewer pre-trained language models available for antibody sequences. Additionally, existing pre-trained models solely rely on embedding representations using amino acids or k-mers, which do not explicitly take into account the role of secondary structure features. Here, we present a new pre-trained model called BERT2DAb. This model incorporates secondary structure information based on self-attention to learn representations of antibody sequences. Our model achieves state-of-the-art performance on three downstream tasks, including two antigen-antibody binding classification tasks (precision: 85.15%/94.86%; recall:87.41%/86.15%) and one antigen-antibody complex mutation binding free energy prediction task (Pearson correlation coefficient: 0.77). Moreover, we propose a novel method to analyze the relationship between attention weights and contact states of pairs of subsequences in tertiary structures. This enhances the interpretability of BERT2DAb. Overall, our model demonstrates strong potential for improving antibody screening and design through downstream applications.

摘要

先前的研究已经产生了大量的抗体序列,这使得可以在氨基酸序列上对语言模型进行预训练,从而提高抗体筛选和优化的效率。然而,与蛋白质相比,用于抗体序列的预训练语言模型较少。此外,现有的预训练模型仅依赖于使用氨基酸或 k-mer 的嵌入表示,而没有明确考虑二级结构特征的作用。在这里,我们提出了一个名为 BERT2DAb 的新预训练模型。该模型基于自注意力纳入二级结构信息,以学习抗体序列的表示。我们的模型在三个下游任务中实现了最先进的性能,包括两个抗原-抗体结合分类任务(精度:85.15%/94.86%;召回率:87.41%/86.15%)和一个抗原-抗体复合物突变结合自由能预测任务(Pearson 相关系数:0.77)。此外,我们提出了一种新的方法来分析三级结构中对序列的注意力权重与接触状态之间的关系。这增强了 BERT2DAb 的可解释性。总的来说,我们的模型通过下游应用展示了在改善抗体筛选和设计方面的巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cd5/10793684/77ddbc67837f/KMAB_A_2285904_F0001_OC.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验