Fasoulis Romanos, Rigo Mauricio Menegatti, Antunes Dinler Amaral, Paliouras Georgios, Kavraki Lydia E
Department of Computer Science, Rice University, 6100 Main St, Houston, 77005, TX, United States.
Department of Biology and Biochemistry, University of Houston, 4800 Calhoun Rd, Houston, 77004, TX, United States.
Immunoinformatics (Amst). 2024 Mar;13. doi: 10.1016/j.immuno.2023.100030. Epub 2023 Dec 21.
The cellular immune response comprises several processes, with the most notable ones being the binding of the peptide to the Major Histocompability Complex (MHC), the peptide-MHC (pMHC) presentation to the surface of the cell, and the recognition of the pMHC by the T-Cell Receptor. Identifying the most potent peptide targets for MHC binding, presentation and T-cell recognition is vital for developing peptide-based vaccines and T-cell-based immunotherapies. Data-driven tools that predict each of these steps have been developed, and the availability of mass spectrometry (MS) datasets has facilitated the development of accurate Machine Learning (ML) methods for class-I pMHC binding prediction. However, the accuracy of ML-based tools for pMHC kinetic stability prediction and peptide immunogenicity prediction is uncertain, as stability and immunogenicity datasets are not abundant. Here, we use transfer learning techniques to improve stability and immunogenicity predictions, by taking advantage of a large number of binding affinity and MS datasets. The resulting models, TLStab and TLImm, exhibit comparable or better performance than state-of-the-art approaches on different stability and immunogenicity test sets respectively. Our approach demonstrates the promise of learning from the task of peptide binding to improve predictions on downstream tasks. The source code of TLStab and TLImm is publicly available at https://github.com/KavrakiLab/TL-MHC.
细胞免疫反应包括几个过程,其中最显著的是肽与主要组织相容性复合体(MHC)的结合、肽-MHC(pMHC)在细胞表面的呈递以及T细胞受体对pMHC的识别。确定用于MHC结合、呈递和T细胞识别的最有效的肽靶点对于开发基于肽的疫苗和基于T细胞的免疫疗法至关重要。已经开发了预测这些步骤中每一步的数据驱动工具,并且质谱(MS)数据集的可用性促进了用于I类pMHC结合预测的准确机器学习(ML)方法的开发。然而,基于ML的工具用于pMHC动力学稳定性预测和肽免疫原性预测的准确性尚不确定,因为稳定性和免疫原性数据集并不丰富。在这里,我们利用大量的结合亲和力和MS数据集,使用迁移学习技术来改进稳定性和免疫原性预测。所得模型TLStab和TLImm在不同的稳定性和免疫原性测试集上分别表现出与最先进方法相当或更好的性能。我们的方法证明了从肽结合任务中学习以改进对下游任务预测的前景。TLStab和TLImm的源代码可在https://github.com/KavrakiLab/TL-MHC上公开获取。