Li Guangyuan, Iyer Balaji, Prasath V B Surya, Ni Yizhao, Salomonis Nathan
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA.
bioRxiv. 2020 Dec 24:2020.12.24.424262. doi: 10.1101/2020.12.24.424262.
T-cells play an essential role in the adaptive immune system by seeking out, binding and destroying foreign antigens presented on the cell surface of diseased cells. An improved understanding of T-cell immunity will greatly aid in the development of new cancer immunotherapies and vaccines for life threatening pathogens. Central to the design of such targeted therapies are computational methods to predict non-native epitopes to elicit a T cell response, however, we currently lack accurate immunogenicity inference methods. Another challenge is the ability to accurately simulate immunogenic peptides for specific human leukocyte antigen (HLA) alleles, for both synthetic biological applications and to augment real training datasets. Here, we proposed a beta-binomial distribution approach to derive epitope immunogenic potential from sequence alone. We conducted systematic benchmarking of five traditional machine learning (ElasticNet, KNN, SVM, Random Forest, AdaBoost) and three deep learning models (CNN, ResNet, GNN) using three independent prior validated immunogenic peptide collections (dengue virus, cancer neoantigen and SARS-Cov-2). We chose the CNN model as the best prediction model based on its adaptivity for small and large datasets, and performance relative to existing methods. In addition to outperforming two highly used immunogenicity prediction algorithms, DeepHLApan and IEDB, DeepImmuno-CNN further correctly predicts which residues are most important for T cell antigen recognition. Our independent generative adversarial network (GAN) approach, DeepImmuno-GAN, was further able to accurately simulate immunogenic peptides with physiochemical properties and immunogenicity predictions similar to that of real antigens. We provide DeepImmuno-CNN as source code and an easy-to-use web interface.
DeepImmuno Python3 code is available at https://github.com/frankligy/DeepImmuno . The DeepImmuno web portal is available from https://deepimmuno.herokuapp.com . The data in this article is available in GitHub and supplementary materials.
T细胞在适应性免疫系统中发挥着至关重要的作用,它通过寻找、结合并破坏患病细胞表面呈现的外来抗原。对T细胞免疫的深入理解将极大地有助于开发针对危及生命的病原体的新型癌症免疫疗法和疫苗。此类靶向疗法设计的核心是预测能引发T细胞反应的非天然表位的计算方法,然而,我们目前缺乏准确的免疫原性推断方法。另一个挑战是能够准确模拟针对特定人类白细胞抗原(HLA)等位基因的免疫原性肽,用于合成生物学应用以及扩充真实训练数据集。在此,我们提出了一种β - 二项分布方法,仅从序列推导表位免疫原性潜力。我们使用三个独立的先前经过验证的免疫原性肽集合(登革热病毒、癌症新抗原和SARS - Cov - 2),对五种传统机器学习(弹性网络、K近邻、支持向量机、随机森林、自适应增强)和三种深度学习模型(卷积神经网络、残差网络、图神经网络)进行了系统的基准测试。基于其对大小数据集的适应性以及相对于现有方法的性能,我们选择卷积神经网络模型作为最佳预测模型。除了优于两种常用的免疫原性预测算法DeepHLApan和免疫表位数据库(IEDB)之外,DeepImmuno - CNN还能进一步正确预测哪些残基对于T细胞抗原识别最为重要。我们独立的生成对抗网络(GAN)方法DeepImmuno - GAN,能够进一步准确模拟具有与真实抗原相似的物理化学性质和免疫原性预测的免疫原性肽。我们以源代码和易于使用的网络界面形式提供了DeepImmuno - CNN。
DeepImmuno Python3代码可在https://github.com/frankligy/DeepImmuno获取。DeepImmuno网络门户可从https://deepimmuno.herokuapp.com获取。本文中的数据可在GitHub和补充材料中获取。