Suppr超能文献

运用蛋白质语言模型研究人类和非肥胖型糖尿病小鼠 MHC Ⅱ类免疫肽组。

Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling.

机构信息

Discovery Sciences, Novartis Institutes for Biomedical Research, Basel 4056, Switzerland.

NIBR Research Informatics, Novartis Institutes for Biomedical Research, Basel 4056, Switzerland.

出版信息

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad469.

Abstract

MOTIVATION

Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico, or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large-scale datasets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in protein language modeling have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction.

RESULTS

Here, we train a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluate its performance on the peptide presentation prediction task. We show that the transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance. We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-Ag7 MHC class II molecule expressed by nonobese diabetic mice enabled for the first time the accurate in silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications.

AVAILABILITY AND IMPLEMENTATION

The source code is available at https://github.com/Novartis/AEGIS.

摘要

动机

鉴定与主要组织相容性复合体 II(MHCII)相关的肽是评估治疗药物和药物原型免疫调节功能的核心任务。MHCII-肽呈递预测具有多种生物制药应用,包括生物制剂和工程衍生物的计算机安全性评估,或在免疫疾病和癌症中快速推进抗原特异性免疫调节药物发现计划。这导致了大量关于适应性免疫受体抗原反应和 MHC 相关肽蛋白质组学的数据集的收集。同时,最近在蛋白质语言建模方面的深度学习算法进展表明,利用大量序列数据并改进 MHC 呈递预测具有潜力。

结果

在这里,我们在人类和小鼠 MHCII 免疫肽组学数据(包括临床前小鼠模型)上训练了一个紧凑的转换器模型(AEGIS),并评估了其在肽呈递预测任务上的性能。我们表明,该转换器与现有的深度学习算法表现相当,并且组合来自多个生物体的数据集可以提高模型性能。我们训练了不带和带 MHCII 信息的模型变体。在这两种替代方案中,包含由非肥胖型糖尿病小鼠表达的 I-Ag7 MHC II 类分子呈递的肽,首次实现了在临床前 1 型糖尿病模型生物体内呈递肽的准确计算机预测,这具有有前景的治疗应用。

可用性和实现

源代码可在 https://github.com/Novartis/AEGIS 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6381/10421966/53c0475fa8d2/btad469f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验