Wang Zicheng, Shen Yufeng
bioRxiv. 2025 May 4:2025.04.29.651344. doi: 10.1101/2025.04.29.651344.
The binding of peptide-MHC complexes by T cell receptors (TCRs) is crucial for T cell antigen recognition in adaptive immunity. High-throughput multiplex assays have generated valuable data and insights about antigen specificity of TCRs. However, identifying which TCRs recognize which antigens remains a significant challenge due to the immense diversity of TCR. Here we describe G2VTCR (Graph2Vec-based Representation and Embedding of TCR and Targets for Enhanced Recognition Analysis), a computational method that uses atomic level graph embedding to predict TCR-antigen recognition. G2VTCR represents antigens and the third complementarity-determining region (CDR3) of TCR sequences using graphs, in which nodes encode atomic identities and edges encode chemical bonds between atoms, and then uses Weisfeiler-Lehman iterations to produce embeddings. The embeddings can be used for supervised classification tasks in TCR-antigen binding prediction and unsupervised clustering of TCRs. We evaluated G2VTCR using publicly available paired TCR-CDR3/antigen data generated by antigen-stimulation experiments. We show that G2VTCR has better performance in both classification and clustering than other embedding methods including pre-trained protein language models. We investigated the impact of Weisfeiler-Lehman iterations and the sample size of TCR CDR3 on classification performance. Our results highlight the utility of atomic level graphical embedding of immune repertoire sequences for antigen specificity prediction.
在适应性免疫中,T细胞受体(TCR)与肽 - 主要组织相容性复合体(peptide - MHC)复合物的结合对于T细胞抗原识别至关重要。高通量多重分析已经产生了关于TCR抗原特异性的有价值的数据和见解。然而,由于TCR的巨大多样性,确定哪些TCR识别哪些抗原仍然是一项重大挑战。在此,我们描述了G2VTCR(基于Graph2Vec的TCR及其靶标的表示与嵌入以增强识别分析),这是一种计算方法,它使用原子级图嵌入来预测TCR - 抗原识别。G2VTCR使用图来表示抗原和TCR序列的第三个互补决定区(CDR3),其中节点编码原子身份,边编码原子之间的化学键,然后使用魏斯费勒 - 莱曼(Weisfeiler - Lehman)迭代来生成嵌入。这些嵌入可用于TCR - 抗原结合预测中的监督分类任务以及TCR的无监督聚类。我们使用通过抗原刺激实验生成的公开可用的配对TCR - CDR3/抗原数据评估了G2VTCR。我们表明,G2VTCR在分类和聚类方面比包括预训练蛋白质语言模型在内的其他嵌入方法具有更好的性能。我们研究了魏斯费勒 - 莱曼迭代和TCR CDR3样本大小对分类性能的影响。我们的结果突出了免疫组库序列的原子级图形嵌入在抗原特异性预测中的效用。
Brief Bioinform. 2024-3-27
J Immunother Cancer. 2025-7-31
Cochrane Database Syst Rev. 2022-7-22
Bioinformatics. 2025-7-1
Nat Mach Intell. 2021-10
Bioinformatics. 2021-7-12