Suppr超能文献

MHCSeqNet2 改进了低数据等位基因的肽类 I MHC 结合预测。

MHCSeqNet2-improved peptide-class I MHC binding prediction for alleles with low data.

机构信息

Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand.

Center of Excellence in Computational Molecular Biology, Division of Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand.

出版信息

Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad780.

Abstract

MOTIVATION

The binding of a peptide antigen to a Class I major histocompatibility complex (MHC) protein is part of a key process that lets the immune system recognize an infected cell or a cancer cell. This mechanism enabled the development of peptide-based vaccines that can activate the patient's immune response to treat cancers. Hence, the ability of accurately predict peptide-MHC binding is an essential component for prioritizing the best peptides for each patient. However, peptide-MHC binding experimental data for many MHC alleles are still lacking, which limited the accuracy of existing prediction models.

RESULTS

In this study, we presented an improved version of MHCSeqNet that utilized sub-word-level peptide features, a 3D structure embedding for MHC alleles, and an expanded training dataset to achieve better generalizability on MHC alleles with small amounts of data. Visualization of MHC allele embeddings confirms that the model was able to group alleles with similar binding specificity, including those with no peptide ligand in the training dataset. Furthermore, an external evaluation suggests that MHCSeqNet2 can improve the prioritization of T cell epitopes for MHC alleles with small amount of training data.

AVAILABILITY AND IMPLEMENTATION

The source code and installation instruction for MHCSeqNet2 are available at https://github.com/cmb-chula/MHCSeqNet2.

摘要

动机

肽抗原与 I 类主要组织相容性复合体 (MHC) 蛋白的结合是免疫系统识别感染细胞或癌细胞的关键过程的一部分。该机制使基于肽的疫苗得以发展,这些疫苗可以激活患者的免疫反应来治疗癌症。因此,准确预测肽-MHC 结合的能力是为每个患者确定最佳肽的优先级的重要组成部分。然而,许多 MHC 等位基因的肽-MHC 结合实验数据仍然缺乏,这限制了现有预测模型的准确性。

结果

在这项研究中,我们提出了 MHCSeqNet 的改进版本,该版本利用了子词级别的肽特征、MHC 等位基因的 3D 结构嵌入以及扩展的训练数据集,以在数据量较少的情况下实现更好的 MHC 等位基因通用性。MHC 等位基因嵌入的可视化证实,该模型能够对具有相似结合特异性的等位基因进行分组,包括在训练数据集中没有肽配体的等位基因。此外,外部评估表明,MHCSeqNet2 可以改善具有少量训练数据的 MHC 等位基因的 T 细胞表位的优先级排序。

可用性和实施

MHCSeqNet2 的源代码和安装说明可在 https://github.com/cmb-chula/MHCSeqNet2 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f1b/10783953/4fe0e2af9994/btad780f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验