基于多阳性对比学习的交叉注意力模型用于T细胞受体-抗原结合预测

Multi-positive contrastive learning-based cross-attention model for T cell receptor-antigen binding prediction.

作者信息

Shuai Yi, Shen Pengcheng, Zhang Xianrui

机构信息

Peng Cheng Laboratory, Shenzhen, 518066, China.

State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai, 200240, China.

出版信息

Comput Methods Programs Biomed. 2025 May 10;268:108797. doi: 10.1016/j.cmpb.2025.108797.

DOI:10.1016/j.cmpb.2025.108797

PMID:40378554

Abstract

BACKGROUND AND OBJECTIVE

T cells play a vital role in the immune system by recognizing and eliminating infected or cancerous cells, thus driving adaptive immune responses. Their activation is triggered by the binding of T cell receptors (TCRs) to epitopes presented on Major Histocompatibility Complex (MHC) molecules. However, experimentally identifying antigens that could be recognizable by T cells and possess immunogenic properties is resource-intensive, with most candidates proving non-immunogenic, underscoring the need for computational tools to predict peptide-MHC (pMHC) and TCR binding. Despite extensive efforts, accurately predicting TCR-antigen binding pairs remains challenging due to the vast diversity of TCRs.

METHODS

In this study, we propose a Contrastive Cross-attention model for TCR (ConTCR) and pMHC binding prediction. Firstly, the pMHC and TCR sequences are transformed into high-level embedding by pretrained encoders as feature representations. Then, we employ the multi-modal cross-attention to combine the features between pMHC sequences and TCR sequences. Next, based on the contrastive learning strategy, we pretrained the backbone of ConTCR to boost the model's feature extraction ability for pMHC and TCR sequences. Finally, the model is fine-tuned for classification between positive and negative samples.

RESULTS

Based on this advanced strategy, our proposed model could effectively capture the critical information on TCR-pMHC interactions, and the model is visualized by the attention score heatmap for interpretability. ConTCR demonstrates strong generalization in predicting binding specificity for unseen epitopes and diverse TCR repertoires. On independent non-zero-shot test sets, the model achieved AUC-ROC scores of 0.849 and 0.950; on zero-shot test sets, it obtained AUC-ROC scores of 0.830 and 0.938.

CONCLUSION

Our framework offers a promising solution for improving pMHC-TCR binding prediction and model interpretability. By leveraging the ConTCR model and pMHC-TCR features, we achieve more precise precision than recently advanced models. Overall, ConTCR is a robust tool for predicting pMHC-TCR binding and holds significant promise to advance TCR-based immunotherapies as a valuable artificial intelligence tool. The codes and data used in this study are available at this website.

摘要

背景与目的

T细胞通过识别和清除受感染或癌变的细胞在免疫系统中发挥至关重要的作用，从而驱动适应性免疫反应。它们的激活是由T细胞受体（TCR）与主要组织相容性复合体（MHC）分子上呈递的表位结合所触发的。然而，通过实验鉴定可被T细胞识别并具有免疫原性的抗原需要耗费大量资源，大多数候选抗原被证明无免疫原性，这凸显了使用计算工具预测肽-MHC（pMHC）和TCR结合的必要性。尽管付出了巨大努力，但由于TCR的巨大多样性，准确预测TCR-抗原结合对仍然具有挑战性。

方法

在本研究中，我们提出了一种用于TCR（ConTCR）和pMHC结合预测的对比交叉注意力模型。首先，通过预训练的编码器将pMHC和TCR序列转换为高级嵌入作为特征表示。然后，我们采用多模态交叉注意力来结合pMHC序列和TCR序列之间的特征。接下来，基于对比学习策略，我们对ConTCR的主干进行预训练，以提高模型对pMHC和TCR序列的特征提取能力。最后，对模型进行微调以区分正样本和负样本。

结果

基于这一先进策略，我们提出的模型能够有效地捕捉TCR-pMHC相互作用的关键信息，并通过注意力分数热图对模型进行可视化以提高可解释性。ConTCR在预测未见表位和多样TCR库的结合特异性方面表现出很强的泛化能力。在独立的非零样本测试集上，该模型的AUC-ROC分数分别为0.849和0.950；在零样本测试集上，其AUC-ROC分数分别为0.830和0.938。

结论

我们的框架为改进pMHC-TCR结合预测和模型可解释性提供了一个有前景的解决方案。通过利用ConTCR模型和pMHC-TCR特征，我们实现了比最近的先进模型更高的精度。总体而言，ConTCR是一种预测pMHC-TCR结合的强大工具，作为一种有价值的人工智能工具，在推进基于TCR的免疫疗法方面具有重大前景。本研究中使用的代码和数据可在本网站获取。