通过肽段距离分析评估TCR结合预测器的泛化能力。

Assessing the generalization capabilities of TCR binding predictors via peptide distance analysis.

作者信息

Castorina Leonardo V, Grazioli Filippo, Machart Pierre, Mösch Anja, Errica Federico

机构信息

School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.

NEC Laboratories Europe, Heidelberg, Germany.

出版信息

PLoS One. 2025 May 20;20(5):e0324011. doi: 10.1371/journal.pone.0324011. eCollection 2025.

DOI:10.1371/journal.pone.0324011

PMID:40392871

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12091837/

Abstract

Understanding the interaction between T Cell Receptors (TCRs) and peptide-bound Major Histocompatibility Complexes (pMHCs) is crucial for comprehending immune responses and developing targeted immunotherapies. While recent machine learning (ML) models show remarkable success in predicting TCR-pMHC binding within training data, these models often fail to generalize to peptides outside their training distributions, raising concerns about their applicability in therapeutic settings. Understanding and improving the generalization of these models is therefore critical to ensure real-world applications. To address this issue, we evaluate the effect of the distance between training and testing peptide distributions on ML model empirical risk assessments, using sequence-based and 3D structure-based distance metrics. In our analysis we use several state-of-the-art models for TCR-peptide binding prediction: Attentive Variational Information Bottleneck (AVIB), NetTCR-2.0 and -2.2, and ERGO II (pre-trained autoencoder) and ERGO II (LSTM). In this work, we introduce a novel approach for assessing the generalization capabilities of TCR binding predictors: the Distance Split (DS) algorithm. The DS algorithm controls the distance between training and testing peptides based on both sequence and structure, allowing for a more nuanced evaluation of model performance. We show that lower 3D shape similarity between training and test peptides is associated with a harder out-of-distribution task definition, which is more interesting when measuring the ability to generalize to unseen peptides. However, we observe the opposite effect when splitting using sequence-based similarity. These findings highlight the importance of using a distance-based splitting approach to benchmark models. This could then be used to estimate a confidence score on predictions on novel and unseen peptides, based on how different they are from the training ones. Additionally, our results may hint that employing 3D shape to complement sequence information could improve the accuracy of TCR-pMHC binding predictors.

摘要

理解T细胞受体（TCR）与肽结合的主要组织相容性复合体（pMHC）之间的相互作用对于理解免疫反应和开发靶向免疫疗法至关重要。虽然最近的机器学习（ML）模型在预测训练数据中的TCR-pMHC结合方面取得了显著成功，但这些模型往往无法推广到其训练分布之外的肽段，这引发了人们对其在治疗环境中适用性的担忧。因此，理解和提高这些模型的泛化能力对于确保实际应用至关重要。为了解决这个问题，我们使用基于序列和基于3D结构的距离度量，评估训练和测试肽分布之间的距离对ML模型经验风险评估的影响。在我们的分析中，我们使用了几种用于TCR-肽结合预测的先进模型：注意力变分信息瓶颈（AVIB）、NetTCR-2.0和-2.2，以及ERGO II（预训练自动编码器）和ERGO II（长短期记忆网络）。在这项工作中，我们引入了一种评估TCR结合预测器泛化能力的新方法：距离分割（DS）算法。DS算法基于序列和结构控制训练和测试肽之间的距离，从而能够更细致地评估模型性能。我们表明，训练和测试肽之间较低的3D形状相似性与更难的分布外任务定义相关，这在测量推广到未见肽的能力时更有意义。然而，当使用基于序列的相似性进行分割时，我们观察到相反的效果。这些发现突出了使用基于距离的分割方法对模型进行基准测试的重要性。然后，这可以用于根据新肽和未见肽与训练肽的差异程度来估计对它们预测的置信度得分。此外，我们的结果可能暗示，采用3D形状来补充序列信息可以提高TCR-pMHC结合预测器的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbd0/12091837/1cb1322669b7/pone.0324011.g001.jpg

相似文献

Assessing the generalization capabilities of TCR binding predictors via peptide distance analysis.通过肽段距离分析评估TCR结合预测器的泛化能力。

PLoS One. 2025 May 20;20(5):e0324011. doi: 10.1371/journal.pone.0324011. eCollection 2025.

Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency.TCR-pMHC 预测工具的性能比较揭示了强烈的数据依赖性。

Front Immunol. 2023 Apr 18;14:1128326. doi: 10.3389/fimmu.2023.1128326. eCollection 2023.

A flexible docking approach for prediction of T cell receptor-peptide-MHC complexes.一种用于预测 T 细胞受体-肽-MHC 复合物的柔性对接方法。

Protein Sci. 2013 Jan;22(1):35-46. doi: 10.1002/pro.2181.

NetTCR-2.1: Lessons and guidance on how to develop models for TCR specificity predictions.NetTCR-2.1：关于如何开发 TCR 特异性预测模型的经验教训和指导。

Front Immunol. 2022 Dec 6;13:1055151. doi: 10.3389/fimmu.2022.1055151. eCollection 2022.

On TCR binding predictors failing to generalize to unseen peptides.TCR 结合预测因子无法泛化到未见的肽。

Front Immunol. 2022 Oct 21;13:1014256. doi: 10.3389/fimmu.2022.1014256. eCollection 2022.

Structure-Directed Pan-Specific T-Cell Receptor-Peptide-Major Histocompatibility Complex Interaction Prediction.基于结构的泛特异性T细胞受体-肽-主要组织相容性复合体相互作用预测

J Chem Inf Model. 2025 May 12;65(9):4674-4686. doi: 10.1021/acs.jcim.5c00055. Epub 2025 Apr 29.

A structural-based machine learning method to classify binding affinities between TCR and peptide-MHC complexes.一种基于结构的机器学习方法，用于分类 TCR 与肽-MHC 复合物之间的结合亲和力。

Mol Immunol. 2021 Nov;139:76-86. doi: 10.1016/j.molimm.2021.07.020. Epub 2021 Aug 26.

Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction.T 细胞受体α和β CDR3、MHC 分型、V 和 J 基因对肽结合预测的贡献。

Front Immunol. 2021 Apr 26;12:664514. doi: 10.3389/fimmu.2021.664514. eCollection 2021.

Attention-aware differential learning for predicting peptide-MHC class I binding and T cell receptor recognition.用于预测肽-MHC I类结合和T细胞受体识别的注意力感知差异学习

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf038.

MPID-T: database for sequence-structure-function information on T-cell receptor/peptide/MHC interactions.MPID-T：T细胞受体/肽/MHC相互作用的序列-结构-功能信息数据库。

Appl Bioinformatics. 2006;5(2):111-4. doi: 10.2165/00822942-200605020-00005.

本文引用的文献

epiTCR-KDA: knowledge distillation model on dihedral angles for TCR-peptide prediction.epiTCR-KDA：用于TCR-肽预测的基于二面角的知识蒸馏模型。

Bioinform Adv. 2024 Nov 29;4(1):vbae190. doi: 10.1093/bioadv/vbae190. eCollection 2024.

tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs.tcrBLOSUM：一种氨基酸替换矩阵，用于灵敏比对远距离表位特异性 TCR。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae602.

TCRcost: a deep learning model utilizing TCR 3D structure for enhanced of TCR-peptide binding.TCRcost：一种利用TCR三维结构增强TCR-肽结合的深度学习模型。

Front Genet. 2024 Oct 2;15:1346784. doi: 10.3389/fgene.2024.1346784. eCollection 2024.

Predicting TCR sequences for unseen antigen epitopes using structural and sequence features.使用结构和序列特征预测未知抗原表位的 TCR 序列。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae210.

Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells.深度学习预测 TCR-表位相互作用揭示了双α T 细胞中表位特异性链。

Nat Commun. 2024 Apr 13;15(1):3211. doi: 10.1038/s41467-024-47461-8.

Mutations in the SARS-CoV-2 spike receptor binding domain and their delicate balance between ACE2 affinity and antibody evasion.SARS-CoV-2 刺突受体结合域的突变及其在与 ACE2 亲和力和抗体逃逸之间的微妙平衡。

Protein Cell. 2024 May 28;15(6):403-418. doi: 10.1093/procel/pwae007.

Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration.通过联合 pan- 和肽特异性训练、损失缩放和序列相似性集成来增强 TCR 特异性预测。

Elife. 2024 Mar 4;12:RP93934. doi: 10.7554/eLife.93934.

TIMED-Design: flexible and accessible protein sequence design with convolutional neural networks.TIMED-Design：使用卷积神经网络实现灵活且易于访问的蛋白质序列设计。

Protein Eng Des Sel. 2024 Jan 29;37. doi: 10.1093/protein/gzae002.

Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.用于预测T细胞受体-肽结合的注意力网络可将注意力与可解释的蛋白质结构特性相关联。

Front Bioinform. 2023 Dec 18;3:1274599. doi: 10.3389/fbinf.2023.1274599. eCollection 2023.

EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings.EPIC-TRACE：使用注意力和上下文化嵌入来预测 TCR 与未见表位的结合。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad743.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过肽段距离分析评估TCR结合预测器的泛化能力。

Assessing the generalization capabilities of TCR binding predictors via peptide distance analysis.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献