Suppr超能文献

系统地对肽-MHC 结合预测因子进行基准测试:从合成到天然加工的表位。

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.

机构信息

Global Research IT, Merck & Co., Inc., Boston, MA, United States of America.

出版信息

PLoS Comput Biol. 2018 Nov 8;14(11):e1006457. doi: 10.1371/journal.pcbi.1006457. eCollection 2018 Nov.

Abstract

A number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machine learning methods. Despite the recent advances made in generating high-quality MHC-eluted, naturally processed ligandome, the reliability of new predictors on these epitopes has yet to be evaluated. This study reports the latest benchmarking on an extensive set of MHC-binding predictors by using newly available, untested data of both synthetic and naturally processed epitopes. 32 human leukocyte antigen (HLA) class I and 24 HLA class II alleles are included in the blind test set. Artificial neural network (ANN)-based approaches demonstrated better performance than regression-based machine learning and structural modeling. Among the 18 predictors benchmarked, ANN-based mhcflurry and nn_align perform the best for MHC class I 9-mer and class II 15-mer predictions, respectively, on binding/non-binding classification (Area Under Curves = 0.911). NetMHCpan4 also demonstrated comparable predictive power. Our customization of mhcflurry to a pan-HLA predictor has achieved similar accuracy to NetMHCpan. The overall accuracy of these methods are comparable between 9-mer and 10-mer testing data. However, the top methods deliver low correlations between the predicted versus the experimental affinities for strong MHC binders. When used on naturally processed MHC-ligands, tools that have been trained on elution data (NetMHCpan4 and MixMHCpred) shows better accuracy than pure binding affinity predictor. The variability of false prediction rate is considerable among HLA types and datasets. Finally, structure-based predictor of Rosetta FlexPepDock is less optimal compared to the machine learning approaches. With our benchmarking of MHC-binding and MHC-elution predictors using a comprehensive metrics, a unbiased view for establishing best practice of T-cell epitope predictions is presented, facilitating future development of methods in immunogenomics.

摘要

许多基于机器学习的预测因子已经被开发出来,用于根据主要组织相容性复合体 (MHC) Ⅰ类和Ⅱ类结合亲和力来鉴定免疫原性 T 细胞表位。由于不断发展的训练数据和机器学习方法,合理选择最合适的工具变得复杂。尽管最近在生成高质量 MHC 洗脱的天然加工配体组方面取得了进展,但新预测因子在这些表位上的可靠性仍有待评估。本研究报告了通过使用新的未经测试的合成和天然加工表位数据对广泛的 MHC 结合预测因子进行的最新基准测试。32 个人白细胞抗原 (HLA) I 类和 24 HLA II 类等位基因包含在盲测集中。基于人工神经网络 (ANN) 的方法表现出优于基于回归的机器学习和结构建模的更好性能。在基准测试的 18 个预测因子中,基于 ANN 的 mhcflurry 和 nn_align 分别在 MHC I 9-mer 和 II 15- mer 预测的结合/非结合分类中表现最佳(曲线下面积 = 0.911)。NetMHCpan4 也表现出相当的预测能力。我们将 mhcflurry 定制为泛 HLA 预测因子,与 NetMHCpan 达到了类似的准确性。这些方法在 9- mer 和 10- mer 测试数据之间的整体准确性相当。然而,对于强 MHC 结合物,这些方法的预测与实验亲和力之间的相关性较低。当用于天然加工的 MHC 配体时,基于洗脱数据训练的工具(NetMHCpan4 和 MixMHCpred)比纯结合亲和力预测因子具有更高的准确性。假预测率的变化在 HLA 类型和数据集之间相当大。最后,与机器学习方法相比,基于结构的 Rosetta FlexPepDock 预测器的效果较差。通过使用全面的指标对 MHC 结合和 MHC 洗脱预测因子进行基准测试,为建立 T 细胞表位预测的最佳实践提供了一个公正的视角,为免疫基因组学中方法的未来发展提供了便利。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验