Suppr
超能文献

预测主要组织相容性复合体 I 类呈递的肽：免疫肽组学的一种改进的机器学习方法。

Predicting peptide presentation by major histocompatibility complex class I: an improved machine learning approach to the immunopeptidome.

机构信息

Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program, 1300 York Avenue, New York, NY, USA.

Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medical College, 413 East 69th Street, New York, NY, USA.

出版信息

BMC Bioinformatics. 2019 Jan 5;20(1):7. doi: 10.1186/s12859-018-2561-z.

DOI:10.1186/s12859-018-2561-z

PMID:30611210

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6321722/

Abstract

BACKGROUND

To further our understanding of immunopeptidomics, improved tools are needed to identify peptides presented by major histocompatibility complex class I (MHC-I). Many existing tools are limited by their reliance upon chemical affinity data, which is less biologically relevant than sampling by mass spectrometry, and other tools are limited by incomplete exploration of machine learning approaches. Herein, we assemble publicly available data describing human peptides discovered by sampling the MHC-I immunopeptidome with mass spectrometry and use this database to train random forest classifiers (ForestMHC) to predict presentation by MHC-I.

RESULTS

As measured by precision in the top 1% of predictions, our method outperforms NetMHC and NetMHCpan on test sets, and it outperforms both these methods and MixMHCpred on new data from an ovarian carcinoma cell line. We also find that random forest scores correlate monotonically, but not linearly, with known chemical binding affinities, and an information-based analysis of classifier features shows the importance of anchor positions for our classification. The random-forest approach also outperforms a deep neural network and a convolutional neural network trained on identical data. Finally, we use our large database to confirm that gene expression partially determines peptide presentation.

CONCLUSIONS

ForestMHC is a promising method to identify peptides bound by MHC-I. We have demonstrated the utility of random forest-based approaches in predicting peptide presentation by MHC-I, assembled the largest known database of MS binding data, and mined this database to show the effect of gene expression on peptide presentation. ForestMHC has potential applicability to basic immunology, rational vaccine design, and neoantigen binding prediction for cancer immunotherapy. This method is publicly available for applications and further validation.

摘要

背景

为了进一步了解免疫肽组学，我们需要改进工具来识别主要组织相容性复合体 I（MHC-I）呈递的肽。许多现有的工具都受到其对化学亲和力数据的依赖的限制，而这些数据不如质谱法采样更具有生物学相关性，其他工具则受到对机器学习方法的不完全探索的限制。在这里，我们收集了描述通过质谱法采样 MHC-I 免疫肽组学发现的人类肽的公开可用数据，并使用该数据库来训练随机森林分类器（ForestMHC）来预测 MHC-I 的呈递。

结果

通过在预测的前 1%中测量精度，我们的方法在测试集中优于 NetMHC 和 NetMHCpan，并且优于这两种方法以及卵巢癌细胞系的新数据中的 MixMHCpred。我们还发现随机森林分数与已知化学结合亲和力单调相关，但不是线性相关，基于信息的分类器特征分析表明锚定位置对我们的分类很重要。随机森林方法也优于在相同数据上训练的深度神经网络和卷积神经网络。最后，我们使用我们的大型数据库来证实基因表达部分决定了肽的呈递。

结论

ForestMHC 是一种有前途的方法，可以识别与 MHC-I 结合的肽。我们已经证明了基于随机森林的方法在预测 MHC-I 呈递肽方面的有效性，组装了已知最大的 MS 结合数据数据库，并挖掘了该数据库以显示基因表达对肽呈递的影响。ForestMHC 具有在基础免疫学、合理疫苗设计和癌症免疫治疗中的新抗原结合预测方面的潜在适用性。此方法可供应用和进一步验证使用。

相似文献

Predicting peptide presentation by major histocompatibility complex class I: an improved machine learning approach to the immunopeptidome.

BMC Bioinformatics. 2019 Jan 5;20(1):7. doi: 10.1186/s12859-018-2561-z.

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.

PLoS Comput Biol. 2018 Nov 8;14(11):e1006457. doi: 10.1371/journal.pcbi.1006457. eCollection 2018 Nov.

Precision Neoantigen Discovery Using Large-Scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation.

Mol Cell Proteomics. 2023 Apr;22(4):100506. doi: 10.1016/j.mcpro.2023.100506. Epub 2023 Feb 14.

Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction.

BMC Bioinformatics. 2017 Dec 28;18(1):585. doi: 10.1186/s12859-017-1997-x.

NNAlign_MA; MHC Peptidome Deconvolution for Accurate MHC Binding Motif Characterization and Improved T-cell Epitope Predictions.

Mol Cell Proteomics. 2019 Dec;18(12):2459-2477. doi: 10.1074/mcp.TIR119.001658. Epub 2019 Oct 2.

The interdependence of machine learning and LC-MS approaches for an unbiased understanding of the cellular immunopeptidome.

Expert Rev Proteomics. 2022 Feb;19(2):77-88. doi: 10.1080/14789450.2022.2064278. Epub 2022 Apr 21.

Benchmarking predictions of MHC class I restricted T cell epitopes in a comprehensively studied model system.

PLoS Comput Biol. 2020 May 26;16(5):e1007757. doi: 10.1371/journal.pcbi.1007757. eCollection 2020 May.

RBM-MHC: A Semi-Supervised Machine-Learning Method for Sample-Specific Prediction of Antigen Presentation by HLA-I Alleles.

Cell Syst. 2021 Feb 17;12(2):195-202.e9. doi: 10.1016/j.cels.2020.11.005. Epub 2020 Dec 17.

A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction.

Brief Bioinform. 2020 Jul 15;21(4):1119-1135. doi: 10.1093/bib/bbz051.

NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data.

Nucleic Acids Res. 2020 Jul 2;48(W1):W449-W454. doi: 10.1093/nar/gkaa379.

引用本文的文献

Computational methods and data resources for predicting tumor neoantigens.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf302.

Immunopeptidomics-guided discovery and characterization of neoantigens for personalized cancer immunotherapy.

Sci Adv. 2025 May 23;11(21):eadv6445. doi: 10.1126/sciadv.adv6445. Epub 2025 May 21.

Strategies for neoantigen screening and immunogenicity validation in cancer immunotherapy (Review).

Int J Oncol. 2025 Jun;66(6). doi: 10.3892/ijo.2025.5749. Epub 2025 May 9.

A novel immunoinformatic approach for design and evaluation of heptavalent multiepitope foot-and-mouth disease virus vaccine.

BMC Vet Res. 2025 Mar 7;21(1):152. doi: 10.1186/s12917-025-04509-1.

MHC-I-presented non-canonical antigens expand the cancer immunotherapy targets in acute myeloid leukemia.

Sci Data. 2024 Aug 1;11(1):831. doi: 10.1038/s41597-024-03660-y.

Recent Findings on Therapeutic Cancer Vaccines: An Updated Review.

Biomolecules. 2024 Apr 21;14(4):503. doi: 10.3390/biom14040503.

ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae133.

Informing immunotherapy with multi-omics driven machine learning.

NPJ Digit Med. 2024 Mar 14;7(1):67. doi: 10.1038/s41746-024-01043-6.

Improved prediction of MHC-peptide binding using protein language models.

Front Bioinform. 2023 Aug 17;3:1207380. doi: 10.3389/fbinf.2023.1207380. eCollection 2023.

What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome?

Mol Cell Proteomics. 2023 Sep;22(9):100631. doi: 10.1016/j.mcpro.2023.100631. Epub 2023 Aug 11.

本文引用的文献

Scalable and accurate deep learning with electronic health records.

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

MHC class I loaded ligands from breast cancer cell lines: A potential HLA-I-typed antigen collection.

J Proteomics. 2018 Mar 30;176:13-23. doi: 10.1016/j.jprot.2018.01.004. Epub 2018 Jan 10.

High-throughput and Sensitive Immunopeptidomics Platform Reveals Profound Interferonγ-Mediated Remodeling of the Human Leukocyte Antigen (HLA) Ligandome.

Mol Cell Proteomics. 2018 Mar;17(3):533-548. doi: 10.1074/mcp.TIR117.000383. Epub 2017 Dec 14.

Expression Atlas: gene and protein expression across multiple studies and organisms.

Nucleic Acids Res. 2018 Jan 4;46(D1):D246-D251. doi: 10.1093/nar/gkx1158.

The SysteMHC Atlas project.

Nucleic Acids Res. 2018 Jan 4;46(D1):D1237-D1247. doi: 10.1093/nar/gkx664.

NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data.

J Immunol. 2017 Nov 1;199(9):3360-3368. doi: 10.4049/jimmunol.1700893. Epub 2017 Oct 4.

Unveiling the Peptide Motifs of HLA-C and HLA-G from Naturally Presented Peptides and Generation of Binding Prediction Matrices.

J Immunol. 2017 Oct 15;199(8):2639-2651. doi: 10.4049/jimmunol.1700938. Epub 2017 Sep 13.

Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.

PLoS Comput Biol. 2017 Aug 23;13(8):e1005725. doi: 10.1371/journal.pcbi.1005725. eCollection 2017 Aug.

Antigen presentation profiling reveals recognition of lymphoma immunoglobulin neoantigens.

Nature. 2017 Mar 30;543(7647):723-727. doi: 10.1038/nature21433. Epub 2017 Mar 22.

Mass Spectrometry Profiling of HLA-Associated Peptidomes in Mono-allelic Cells Enables More Accurate Epitope Prediction.

Immunity. 2017 Feb 21;46(2):315-326. doi: 10.1016/j.immuni.2017.02.007.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

预测主要组织相容性复合体 I 类呈递的肽：免疫肽组学的一种改进的机器学习方法。

Predicting peptide presentation by major histocompatibility complex class I: an improved machine learning approach to the immunopeptidome.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译