Nilsson Jonas Birkelund, Greenbaum Jason, Peters Bjoern, Nielsen Morten
Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA, United States.
Front Immunol. 2025 Aug 7;16:1616113. doi: 10.3389/fimmu.2025.1616113. eCollection 2025.
Identification of CD8+ T cell epitopes is crucial for advancing vaccine development and immunotherapy strategies. Traditional methods for predicting T cell epitopes primarily focus on MHC presentation, leveraging immunopeptidome data. Recent advancements however suggest significant performance improvements through transfer learning and refinement using epitope data.
To further investigate this, we here develop an enhanced MHC class I (MHC-I) antigen presentation predictor by integrating newly curated binding affinity and eluted ligand datasets, expanding MHC allele coverage, and incorporating novel input features related to the structural constraints of the MHC-I peptide-binding cleft. We next apply transfer learning using experimentally validated pathogen- and cancer-derived epitopes from public databases to refine our prediction method, ensuring comprehensive data partitioning to prevent performance overestimation.
Integration of structural features results in improved predictive power and enhanced identification of peptide residues likely to interact with the MHC. However, our findings indicate that fine-tuning on epitope data only yields a minor accuracy boost. Moreover, the transferability between cancer and pathogen-derived epitopes is limited, suggesting distinct properties between these data types.
In conclusion, while transfer learning can enhance T cell epitope prediction, the performance gains are modest and data type specific. Our final NetMHCpan-4.2 model is publicly accessible at https://services.healthtech.dtu.dk/services/NetMHCpan-4.2, providing a valuable resource for immunological research and therapeutic development.
鉴定CD8 + T细胞表位对于推进疫苗开发和免疫治疗策略至关重要。预测T细胞表位的传统方法主要侧重于利用免疫肽组数据进行MHC呈递。然而,最近的进展表明,通过迁移学习和使用表位数据进行优化可显著提高性能。
为了进一步研究这一点,我们在此开发了一种增强的I类MHC(MHC-I)抗原呈递预测器,通过整合新整理的结合亲和力和洗脱配体数据集、扩大MHC等位基因覆盖范围,并纳入与MHC-I肽结合裂隙的结构限制相关的新输入特征。接下来,我们使用来自公共数据库的经过实验验证的病原体和癌症衍生表位进行迁移学习,以优化我们的预测方法,确保进行全面的数据划分以防止性能高估。
结构特征的整合提高了预测能力,并增强了对可能与MHC相互作用的肽残基的识别。然而,我们的研究结果表明,仅对表位数据进行微调只会略微提高准确性。此外,癌症和病原体衍生表位之间的可转移性有限,表明这些数据类型之间具有不同的特性。
总之,虽然迁移学习可以增强T细胞表位预测,但性能提升幅度不大且具有数据类型特异性。我们最终的NetMHCpan-4.2模型可在https://services.healthtech.dtu.dk/services/NetMHCpan-4.2上公开获取,为免疫学研究和治疗开发提供了宝贵的资源。