Department of Biosciences, University of Salzburg, 5020 Salzburg, Austria.
Int J Mol Sci. 2020 Dec 22;22(1):12. doi: 10.3390/ijms22010012.
Knowledge of MHC II binding peptides is highly desired in immunological research, particularly in the context of cancer, autoimmune diseases, or allergies. The most successful prediction methods are based on machine learning methods trained on sequences of experimentally characterized binding peptides. Here, we describe a complementary approach called MHCII3D, which is based on structural scaffolds of MHC II-peptide complexes and statistical scoring functions (SSFs). The MHC II alleles reported in the Immuno Polymorphism Database are processed in a dedicated 3D-modeling pipeline providing a set of scaffold complexes for each distinct allotype sequence. Antigen protein sequences are threaded through the scaffolds and evaluated by optimized SSFs. We compared the predictive power of MHCII3D with different sequence-based machine learning methods. The Pearson correlation to experimentally determine IC values for MHC II Automated Server Benchmarks data sets from IEDB (Immune Epitope Database) is 0.42, which is in the competitor methods range. We show that MHCII3D is quite robust in leaving one molecule out tests and is therefore not prone to overfitting. Finally, we provide evidence that MHCII3D can complement the current sequence-based methods and help to identify problematic entries in IEDB. Scaffolds and MHCII3D executables can be freely downloaded from our web pages.
在免疫学研究中,特别是在癌症、自身免疫性疾病或过敏的背景下,对 MHC II 结合肽的了解是非常需要的。最成功的预测方法是基于在实验表征的结合肽序列上训练的机器学习方法。在这里,我们描述了一种称为 MHCII3D 的补充方法,它基于 MHC II-肽复合物的结构支架和统计评分函数 (SSF)。免疫多态性数据库中报告的 MHC II 等位基因在专用的 3D 建模管道中进行处理,为每个独特的同种型序列提供一组支架复合物。抗原蛋白序列通过支架进行穿线,并通过优化的 SSF 进行评估。我们比较了 MHCII3D 与不同基于序列的机器学习方法的预测能力。与 IEDB(免疫表位数据库)的 MHC II 自动服务器基准数据集的实验确定的 IC 值的 Pearson 相关性为 0.42,处于竞争方法的范围内。我们表明,MHCII3D 在离开一个分子的测试中非常稳健,因此不容易过度拟合。最后,我们提供了证据表明,MHCII3D 可以补充当前基于序列的方法,并有助于识别 IEDB 中的问题条目。支架和 MHCII3D 可执行文件可以从我们的网页上免费下载。