University of Cincinnati, 3333 Burnet Ave, MLC7024, Cincinnati, OH 45267, USA.
University of Cincinnati, USA.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab160.
Cytolytic T-cells play an essential role in the adaptive immune system by seeking out, binding and killing cells that present foreign antigens on their surface. An improved understanding of T-cell immunity will greatly aid in the development of new cancer immunotherapies and vaccines for life-threatening pathogens. Central to the design of such targeted therapies are computational methods to predict non-native peptides to elicit a T-cell response, however, we currently lack accurate immunogenicity inference methods. Another challenge is the ability to accurately simulate immunogenic peptides for specific human leukocyte antigen alleles, for both synthetic biological applications, and to augment real training datasets. Here, we propose a beta-binomial distribution approach to derive peptide immunogenic potential from sequence alone. We conducted systematic benchmarking of five traditional machine learning (ElasticNet, K-nearest neighbors, support vector machine, Random Forest and AdaBoost) and three deep learning models (convolutional neural network (CNN), Residual Net and graph neural network) using three independent prior validated immunogenic peptide collections (dengue virus, cancer neoantigen and SARS-CoV-2). We chose the CNN as the best prediction model, based on its adaptivity for small and large datasets and performance relative to existing methods. In addition to outperforming two highly used immunogenicity prediction algorithms, DeepImmuno-CNN correctly predicts which residues are most important for T-cell antigen recognition and predicts novel impacts of SARS-CoV-2 variants. Our independent generative adversarial network (GAN) approach, DeepImmuno-GAN, was further able to accurately simulate immunogenic peptides with physicochemical properties and immunogenicity predictions similar to that of real antigens. We provide DeepImmuno-CNN as source code and an easy-to-use web interface.
细胞毒性 T 细胞通过寻找、结合和杀死表面呈现外来抗原的细胞,在适应性免疫系统中发挥着重要作用。对 T 细胞免疫的深入了解将极大地帮助开发新的癌症免疫疗法和针对危及生命的病原体的疫苗。此类靶向治疗的核心是计算方法,以预测非天然肽以引发 T 细胞反应,然而,我们目前缺乏准确的免疫原性推断方法。另一个挑战是能够准确模拟针对特定人类白细胞抗原等位基因的免疫肽,既用于合成生物学应用,也用于增强真实训练数据集。在这里,我们提出了一种β二项式分布方法,仅从序列中得出肽的免疫原性潜力。我们使用三个独立的预先验证的免疫肽集合(登革热病毒、癌症新抗原和 SARS-CoV-2)对五种传统机器学习(ElasticNet、K-最近邻、支持向量机、随机森林和 AdaBoost)和三种深度学习模型(卷积神经网络(CNN)、Residual Net 和图神经网络)进行了系统基准测试。我们选择了 CNN 作为最佳预测模型,这是基于它对小数据集和大数据集的适应性以及相对于现有方法的性能。除了优于两种高度使用的免疫原性预测算法外,DeepImmuno-CNN 还正确预测了哪些残基对 T 细胞抗原识别最重要,并预测了 SARS-CoV-2 变体的新影响。我们独立的生成对抗网络(GAN)方法 DeepImmuno-GAN 还能够准确地模拟具有与真实抗原相似的物理化学性质和免疫原性预测的免疫肽。我们提供 DeepImmuno-CNN 作为源代码和易于使用的网络界面。