• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MHCRoBERTa:通过使用无标签的蛋白质序列进行迁移学习,实现针对泛种属的肽-MHC I 类结合预测。

MHCRoBERTa: pan-specific peptide-MHC class I binding prediction through transfer learning with label-agnostic protein sequences.

机构信息

Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.

General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.

出版信息

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbab595.

DOI:10.1093/bib/bbab595
PMID:35443027
Abstract

Predicting the binding of peptide and major histocompatibility complex (MHC) plays a vital role in immunotherapy for cancer. The success of Alphafold of applying natural language processing (NLP) algorithms in protein secondary struction prediction has inspired us to explore the possibility of NLP methods in predicting peptide-MHC class I binding. Based on the above motivations, we propose the MHCRoBERTa method, RoBERTa pre-training approach, for predicting the binding affinity between type I MHC and peptides. Analysis of the results on benchmark dataset demonstrates that MHCRoBERTa can outperform other state-of-art prediction methods with an increase of the Spearman rank correlation coefficient (SRCC) value. Notably, our model gave a significant improvement on IC50 value. Our method has achieved SRCC value and AUC value as 0.785 and 0.817, respectively. Our SRCC value is 14.3% higher than NetMHCpan3.0 (the second highest SRCC value on pan-specific) and is 3% higher than MHCflurry (the second highest SRCC value on all methods). The AUC value is also better than any other pan-specific methods. Moreover, we visualize the multi-head self-attention for the token representation across the layers and heads by this method. Through the analysis of the representation of each layer and head, we can show whether the model has learned the syntax and semantics necessary to perform the prediction task well. All these results demonstrate that our model can accurately predict the peptide-MHC class I binding affinity and that MHCRoBERTa is a powerful tool for screening potential neoantigens for cancer immunotherapy. MHCRoBERTa is available as an open source software at github (https://github.com/FuxuWang/MHCRoBERTa).

摘要

预测肽和主要组织相容性复合体(MHC)的结合在癌症的免疫治疗中起着至关重要的作用。Alphafold 在应用自然语言处理(NLP)算法进行蛋白质二级结构预测方面的成功,启发我们探索 NLP 方法在预测肽-MHC Ⅰ类结合中的可能性。基于上述动机,我们提出了 MHCRoBERTa 方法,这是一种 RoBERTa 预训练方法,用于预测 I 型 MHC 与肽之间的结合亲和力。在基准数据集上的分析结果表明,MHCRoBERTa 可以优于其他最先进的预测方法,提高 Spearman 秩相关系数(SRCC)值。值得注意的是,我们的模型在 IC50 值上有显著提高。我们的方法分别达到了 0.785 的 SRCC 值和 0.817 的 AUC 值。我们的 SRCC 值比 NetMHCpan3.0(泛特异性中第二高的 SRCC 值)高 14.3%,比 MHCflurry(所有方法中第二高的 SRCC 值)高 3%。AUC 值也优于任何其他泛特异性方法。此外,我们通过这种方法可视化了跨层和多头的令牌表示的多头自注意力。通过对每个层和头的表示进行分析,我们可以展示模型是否已经学习了执行预测任务所需的语法和语义。所有这些结果都表明,我们的模型可以准确地预测肽-MHC Ⅰ类结合亲和力,并且 MHCRoBERTa 是筛选癌症免疫治疗潜在新抗原的强大工具。MHCRoBERTa 可在 github(https://github.com/FuxuWang/MHCRoBERTa)上作为开源软件获得。

相似文献

1
MHCRoBERTa: pan-specific peptide-MHC class I binding prediction through transfer learning with label-agnostic protein sequences.MHCRoBERTa:通过使用无标签的蛋白质序列进行迁移学习,实现针对泛种属的肽-MHC I 类结合预测。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbab595.
2
ACME: pan-specific peptide-MHC class I binding prediction through attention-based deep neural networks.ACME:基于注意力的深度神经网络的泛肽-MHC Ⅰ类结合预测。
Bioinformatics. 2019 Dec 1;35(23):4946-4954. doi: 10.1093/bioinformatics/btz427.
3
Deep learning pan-specific model for interpretable MHC-I peptide binding prediction with improved attention mechanism.基于改进注意力机制的可解释 MHC-I 肽结合预测深度学习泛型模型。
Proteins. 2021 Jul;89(7):866-883. doi: 10.1002/prot.26065. Epub 2021 Mar 18.
4
Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction.用于 pan 特异性肽-MHC 类 I 结合预测的深度卷积神经网络。
BMC Bioinformatics. 2017 Dec 28;18(1):585. doi: 10.1186/s12859-017-1997-x.
5
Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.系统地对肽-MHC 结合预测因子进行基准测试:从合成到天然加工的表位。
PLoS Comput Biol. 2018 Nov 8;14(11):e1006457. doi: 10.1371/journal.pcbi.1006457. eCollection 2018 Nov.
6
HLA class I binding prediction via convolutional neural networks.基于卷积神经网络的 HLA 类 I 结合预测。
Bioinformatics. 2017 Sep 1;33(17):2658-2665. doi: 10.1093/bioinformatics/btx264.
7
High-Throughput MHC I Ligand Prediction Using MHCflurry.使用 MHCflurry 进行高通量 MHC I 配体预测。
Methods Mol Biol. 2020;2120:113-127. doi: 10.1007/978-1-0716-0327-7_8.
8
RPEMHC: improved prediction of MHC-peptide binding affinity by a deep learning approach based on residue-residue pair encoding.RPEMHC:基于残基-残基对编码的深度学习方法提高 MHC-肽结合亲和力的预测。
Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad785.
9
MHCAttnNet: predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model.MHCAttnNet:使用基于注意力的深度神经网络模型预测 MHC 等位基因 I 类和 II 类与肽段的结合
Bioinformatics. 2020 Jul 1;36(Suppl_1):i399-i406. doi: 10.1093/bioinformatics/btaa479.
10
USMPep: universal sequence models for major histocompatibility complex binding affinity prediction.USMPep:用于主要组织相容性复合物结合亲和力预测的通用序列模型。
BMC Bioinformatics. 2020 Jul 2;21(1):279. doi: 10.1186/s12859-020-03631-1.

引用本文的文献

1
Application of artificial intelligence large language models in drug target discovery.人工智能大语言模型在药物靶点发现中的应用。
Front Pharmacol. 2025 Jul 8;16:1597351. doi: 10.3389/fphar.2025.1597351. eCollection 2025.
2
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
3
Beyond digital twins: the role of foundation models in enhancing the interpretability of multiomics modalities in precision medicine.
超越数字孪生:基础模型在提高精准医学中多组学模式的可解释性方面的作用。
FEBS Open Bio. 2025 Aug;15(8):1192-1208. doi: 10.1002/2211-5463.70003. Epub 2025 Feb 24.
4
The clinical application of artificial intelligence in cancer precision treatment.人工智能在癌症精准治疗中的临床应用。
J Transl Med. 2025 Jan 27;23(1):120. doi: 10.1186/s12967-025-06139-5.
5
Transformers meets neoantigen detection: a systematic literature review.变压器与新抗原检测:系统文献综述。
J Integr Bioinform. 2024 Jul 4;21(2). doi: 10.1515/jib-2023-0043. eCollection 2024 Jun 1.
6
Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy.人工智能与新抗原:为精准癌症免疫治疗铺平道路。
Front Immunol. 2024 May 29;15:1394003. doi: 10.3389/fimmu.2024.1394003. eCollection 2024.
7
Transfer learning improves pMHC kinetic stability and immunogenicity predictions.迁移学习提高了肽-主要组织相容性复合体(pMHC)的动力学稳定性和免疫原性预测能力。
Immunoinformatics (Amst). 2024 Mar;13. doi: 10.1016/j.immuno.2023.100030. Epub 2023 Dec 21.
8
Informing immunotherapy with multi-omics driven machine learning.利用多组学驱动的机器学习为免疫治疗提供信息。
NPJ Digit Med. 2024 Mar 14;7(1):67. doi: 10.1038/s41746-024-01043-6.
9
AbImmPred: An immunogenicity prediction method for therapeutic antibodies using AntiBERTy-based sequence features.AbImmPred:一种基于 AntiBERTy 的序列特征的治疗性抗体免疫原性预测方法。
PLoS One. 2024 Feb 23;19(2):e0296737. doi: 10.1371/journal.pone.0296737. eCollection 2024.
10
Advancing bioinformatics with large language models: components, applications and perspectives.利用大语言模型推进生物信息学:组件、应用与展望
ArXiv. 2025 Jan 31:arXiv:2401.04155v2.