De Almeida Mendes Marcus, Chihab Leila, Nilsson Jonas Birkelund, Scheffer Lonneke, Nielsen Morten, Peters Bjoern
Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, San Diego, California, USA.
Department of Health Technology, Technical University of Denmark, Copenhagen, Denmark.
Protein Sci. 2025 Sep;34(9):e70262. doi: 10.1002/pro.70262.
In this study, we analyzed large-scale T-cell receptor (TCR) sequence data to determine whether TCRs preferentially bind to major histocompatibility complex (MHC) class I (CD8+) or class II (CD4+) epitopes. Using the International ImMunoGeneTics information system numbering scheme, we identified specific positions with distinct amino acid enrichment for each MHC class and developed machine learning models for classification. While our frequency-based approach effectively differentiated MHC-I from MHC-II TCRs in cross-validation, performance declined when only beta chain data were used from real-world peripheral blood mononuclear cell samples. However, incorporating the TCR alpha chain significantly improved accuracy, emphasizing its importance for MHC recognition. Overall, we found that V-region loops can signal MHC class bias, aiding in immunotherapy design and TCR repertoire analysis, while highlighting the need for larger, more diverse datasets for reliable predictions.
在本研究中,我们分析了大规模的T细胞受体(TCR)序列数据,以确定TCR是否优先结合主要组织相容性复合体(MHC)I类(CD8 +)或II类(CD4 +)表位。使用国际免疫遗传学信息系统编号方案,我们确定了每个MHC类具有不同氨基酸富集的特定位置,并开发了用于分类的机器学习模型。虽然我们基于频率的方法在交叉验证中有效地区分了MHC-I和MHC-II TCR,但当仅使用来自真实外周血单核细胞样本的β链数据时,性能会下降。然而,纳入TCRα链显著提高了准确性,强调了其对MHC识别的重要性。总体而言,我们发现V区环可以指示MHC类偏向,有助于免疫治疗设计和TCR库分析,并突出了需要更大、更多样化的数据集进行可靠预测。