Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France.
Blavatnik School of Computer Science, Tel Aviv University, 6139601 Tel Aviv, Israel.
Cell Syst. 2021 Feb 17;12(2):195-202.e9. doi: 10.1016/j.cels.2020.11.005. Epub 2020 Dec 17.
The recent increase of immunopeptidomics data, obtained by mass spectrometry or binding assays, opens up possibilities for investigating endogenous antigen presentation by the highly polymorphic human leukocyte antigen class I (HLA-I) protein. State-of-the-art methods predict with high accuracy presentation by HLA alleles that are well represented in databases at the time of release but have a poorer performance for rarer and less characterized alleles. Here, we introduce a method based on Restricted Boltzmann Machines (RBMs) for prediction of antigens presented on the Major Histocompatibility Complex (MHC) encoded by HLA genes-RBM-MHC. RBM-MHC can be trained on custom and newly available samples with no or a small amount of HLA annotations. RBM-MHC ensures improved predictions for rare alleles and matches state-of-the-art performance for well-characterized alleles while being less data demanding. RBM-MHC is shown to be a flexible and easily interpretable method that can be used as a predictor of cancer neoantigens and viral epitopes, as a tool for feature discovery, and to reconstruct peptide motifs presented on specific HLA molecules.
最近,通过质谱或结合测定法获得的免疫肽组学数据的增加,为研究高度多态性的人类白细胞抗原 I 类 (HLA-I) 蛋白的内源性抗原呈递开辟了可能性。最先进的方法可以高度准确地预测在发布时数据库中表达良好的 HLA 等位基因的呈递,但对于较罕见和特征较少的等位基因的性能较差。在这里,我们引入了一种基于受限玻尔兹曼机 (RBM) 的方法,用于预测主要组织相容性复合体 (MHC) 编码的 HLA 基因-RBM-MHC 上呈现的抗原。RBM-MHC 可以在没有或只有少量 HLA 注释的情况下,对自定义和新出现的样本进行训练。RBM-MHC 确保对罕见等位基因的预测得到改善,并与特征良好的等位基因的最新性能相匹配,同时对数据的要求较低。RBM-MHC 被证明是一种灵活且易于解释的方法,可用于预测癌症新抗原和病毒表位,用作特征发现的工具,并重建特定 HLA 分子上呈现的肽基序。