Artificial Intelligence Research Institute, Moscow, Russia.
Sber AI Lab, Moscow, Russia.
Front Immunol. 2022 Sep 15;13:960985. doi: 10.3389/fimmu.2022.960985. eCollection 2022.
One of the primary tasks in vaccine design and development of immunotherapeutic drugs is to predict conformational B-cell epitopes corresponding to primary antibody binding sites within the antigen tertiary structure. To date, multiple approaches have been developed to address this issue. However, for a wide range of antigens their accuracy is limited. In this paper, we applied the transfer learning approach using pretrained deep learning models to develop a model that predicts conformational B-cell epitopes based on the primary antigen sequence and tertiary structure. A pretrained protein language model, ESM-1v, and an inverse folding model, ESM-IF1, were fine-tuned to quantitatively predict antibody-antigen interaction features and distinguish between epitope and non-epitope residues. The resulting model called SEMA demonstrated the best performance on an independent test set with ROC AUC of 0.76 compared to peer-reviewed tools. We show that SEMA can quantitatively rank the immunodominant regions within the SARS-CoV-2 RBD domain. SEMA is available at https://github.com/AIRI-Institute/SEMAi and the web-interface http://sema.airi.net.
在疫苗设计和免疫治疗药物开发中,首要任务之一是预测抗原三级结构中与初级抗体结合位点相对应的构象 B 细胞表位。迄今为止,已经开发出多种方法来解决这个问题。然而,对于广泛的抗原,它们的准确性受到限制。在本文中,我们应用了迁移学习方法,使用预先训练的深度学习模型,开发了一种基于抗原一级序列和三级结构预测构象 B 细胞表位的模型。我们对预先训练好的蛋白质语言模型 ESM-1v 和逆折叠模型 ESM-IF1 进行了微调,以定量预测抗体-抗原相互作用特征,并区分表位和非表位残基。所得到的模型称为 SEMA,与同行评审的工具相比,在独立测试集上的 ROC AUC 为 0.76,表现出了最佳的性能。我们表明,SEMA 可以定量排列 SARS-CoV-2 RBD 结构域内的免疫显性区域。SEMA 可在 https://github.com/AIRI-Institute/SEMAi 上获取,其网络界面为 http://sema.airi.net。