Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.
Centre for Cancer Research, Hudson Institute of Medical Research, Clayton, VIC 3168, Australia.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae087.
The major histocompatibility complex (MHC) encodes a range of immune response genes, including the human leukocyte antigens (HLAs) in humans. These molecules bind peptide antigens and present them on the cell surface for T cell recognition. The repertoires of peptides presented by HLA molecules are termed immunopeptidomes. The highly polymorphic nature of the genres that encode the HLA molecules confers allotype-specific differences in the sequences of bound ligands. Allotype-specific ligand preferences are often defined by peptide-binding motifs. Individuals express up to six classical class I HLA allotypes, which likely present peptides displaying different binding motifs. Such complex datasets make the deconvolution of immunopeptidomic data into allotype-specific contributions and further dissection of binding-specificities challenging. Herein, we developed MHCpLogics as an interactive machine learning-based tool for mining peptide-binding sequence motifs and visualization of immunopeptidome data across complex datasets. We showcase the functionalities of MHCpLogics by analyzing both in-house and published mono- and multi-allelic immunopeptidomics data. The visualization modalities of MHCpLogics allow users to inspect clustered sequences down to individual peptide components and to examine broader sequence patterns within multiple immunopeptidome datasets. MHCpLogics can deconvolute large immunopeptidome datasets enabling the interrogation of clusters for the segregation of allotype-specific peptide sequence motifs, identification of sub-peptidome motifs, and the exportation of clustered peptide sequence lists. The tool facilitates rapid inspection of immunopeptidomes as a resource for the immunology and vaccine communities. MHCpLogics is a standalone application available via an executable installation at: https://github.com/PurcellLab/MHCpLogics.
主要组织相容性复合体 (MHC) 编码一系列免疫反应基因,包括人类白细胞抗原 (HLA)。这些分子结合肽抗原并将其呈现在细胞表面供 T 细胞识别。HLA 分子呈递的肽的组合称为免疫肽组。编码 HLA 分子的基因的高度多态性赋予结合配体的同种异型特异性差异。同种异型特异性配体偏好通常由肽结合基序定义。个体表达多达六个经典的 I 类 HLA 同种异型,这些同种异型可能呈现不同结合基序的肽。如此复杂的数据集使得将免疫肽组学数据分解为同种异型特异性贡献并进一步剖析结合特异性具有挑战性。在这里,我们开发了 MHCpLogics,作为一种基于机器学习的交互式工具,用于挖掘肽结合序列基序并可视化复杂数据集的免疫肽组学数据。我们通过分析内部和已发表的单等位基因和多等位基因免疫肽组学数据来展示 MHCpLogics 的功能。MHCpLogics 的可视化模式允许用户检查聚类序列,直至单个肽成分,并检查多个免疫肽组学数据集中的更广泛序列模式。MHCpLogics 可以分解大型免疫肽组学数据集,从而能够询问聚类以分离同种异型特异性肽序列基序、鉴定亚肽组基序以及导出聚类肽序列列表。该工具可快速检查免疫肽组学,作为免疫学和疫苗社区的资源。MHCpLogics 是一个独立的应用程序,可通过可执行安装在:https://github.com/PurcellLab/MHCpLogics 上获得。