Fraunhofer Heinrich Hertz Institute, Einsteinufer 37, Berlin, 10587, Germany.
BMC Bioinformatics. 2020 Jul 2;21(1):279. doi: 10.1186/s12859-020-03631-1.
Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility complex (MHC). This is an active area of research and there are many MHC binding prediction algorithms that can predict the MHC binding affinity for a given peptide to a high degree of accuracy. However, most of the state-of-the-art approaches make use of complicated training and model selection procedures, are restricted to peptides of a certain length and/or rely on heuristics.
We put forward USMPep, a simple recurrent neural network that reaches state-of-the-art approaches on MHC class I binding prediction with a single, generic architecture and even a single set of hyperparameters both on IEDB benchmark datasets and on the very recent HPV dataset. Moreover, the algorithm is competitive for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can still slightly improve the performance. The direct application of the approach to MHC class II binding prediction shows a solid performance despite of limited training data.
We demonstrate that competitive performance in MHC binding affinity prediction can be reached with a standard architecture and training procedure without relying on any heuristics.
免疫疗法是一种有前途的个性化癌症治疗方法。在这个过程中,一个关键的算法挑战是决定给定的肽(新表位)是否与主要组织相容性复合体(MHC)结合。这是一个活跃的研究领域,有许多 MHC 结合预测算法可以高度准确地预测给定肽与 MHC 的结合亲和力。然而,大多数最先进的方法都使用复杂的训练和模型选择过程,限制于特定长度的肽,或者依赖于启发式方法。
我们提出了 USMPep,这是一种简单的递归神经网络,在 MHC 类 I 结合预测方面达到了最先进的水平,它使用单一的、通用的架构,甚至在 IEDB 基准数据集和最近的 HPV 数据集上使用单一的超参数集,就能达到这一水平。此外,该算法在从零开始训练的单个模型方面具有竞争力,而集成多个回归器和语言模型预训练仍然可以略微提高性能。尽管训练数据有限,该方法在 MHC 类 II 结合预测中的直接应用也表现出了良好的性能。
我们证明,在不依赖任何启发式方法的情况下,使用标准的架构和训练程序也可以达到 MHC 结合亲和力预测的竞争力。