Suppr超能文献

注意感知对比学习预测 T 细胞受体-抗原结合特异性。

Attention-aware contrastive learning for predicting T cell receptor-antigen binding specificity.

机构信息

School of Computer Science and Technology, Nanjing Tech University, 211816, Nanjing, China.

出版信息

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac378.

Abstract

MOTIVATION

It has been proven that only a small fraction of the neoantigens presented by major histocompatibility complex (MHC) class I molecules on the cell surface can elicit T cells. This restriction can be attributed to the binding specificity of T cell receptor (TCR) and peptide-MHC complex (pMHC). Computational prediction of T cells binding to neoantigens is a challenging and unresolved task.

RESULTS

In this paper, we proposed an attention-aware contrastive learning model, ATMTCR, to infer the TCR-pMHC binding specificity. For each TCR sequence, we used a transformer encoder to transform it to latent representation, and then masked a percentage of amino acids guided by attention weights to generate its contrastive view. Compared to fully-supervised baseline model, we verified that contrastive learning-based pretraining on large-scale TCR sequences significantly improved the prediction performance of downstream tasks. Interestingly, masking a percentage of amino acids with low attention weights yielded best performance compared to other masking strategies. Comparison experiments on two independent datasets demonstrated our method achieved better performance than other existing algorithms. Moreover, we identified important amino acids and their positional preference through attention weights, which indicated the potential interpretability of our proposed model.

摘要

动机

已经证明,主要组织相容性复合体(MHC)I 类分子在细胞表面呈现的新抗原中,只有一小部分能够引发 T 细胞。这种限制可以归因于 T 细胞受体(TCR)和肽-MHC 复合物(pMHC)的结合特异性。计算预测 T 细胞与新抗原的结合是一项具有挑战性且尚未解决的任务。

结果

在本文中,我们提出了一种注意感知的对比学习模型 ATMTCR,以推断 TCR-pMHC 的结合特异性。对于每个 TCR 序列,我们使用一个转换器编码器将其转换为潜在表示,然后根据注意力权重掩蔽一定比例的氨基酸以生成其对比视图。与全监督基线模型相比,我们验证了在大规模 TCR 序列上进行基于对比学习的预训练显著提高了下游任务的预测性能。有趣的是,与其他掩蔽策略相比,掩蔽具有低注意力权重的一定比例的氨基酸可获得最佳性能。在两个独立数据集上的比较实验表明,我们的方法优于其他现有算法。此外,我们通过注意力权重确定了重要的氨基酸及其位置偏好,这表明我们提出的模型具有潜在的可解释性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验