Koyama Kyohei, Hashimoto Kosuke, Nagao Chioko, Mizuguchi Kenji
Laboratory for Computational Biology, Institute for Protein Research, Osaka University, Osaka, Japan.
National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan.
Front Bioinform. 2023 Dec 18;3:1274599. doi: 10.3389/fbinf.2023.1274599. eCollection 2023.
Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.
了解T细胞受体(TCR)如何识别其特定配体肽对于深入了解生物学功能和疾病机制至关重要。尽管其很重要,但通过实验确定TCR-肽-主要组织相容性复合体(TCR-pMHC)相互作用既昂贵又耗时。为应对这一挑战,人们提出了计算方法,但这些方法通常仅通过内部回顾性验证进行评估,很少有研究人员将语言模型中的注意力层纳入结构信息并进行测试。因此,在本研究中,我们基于Transformer的改进版本(一种源-目标注意力神经网络)开发了一种机器学习模型,仅根据TCR互补决定区(CDR)3和肽的氨基酸序列来预测TCR-pMHC相互作用。该模型在TCR-pMHC相互作用的基准数据集以及全新的外部数据集上都取得了具有竞争力的性能。此外,通过分析结合预测结果,我们将神经网络权重与蛋白质结构特性相关联。通过将残基分为大注意力组和小注意力组,我们确定了与大注意力残基相关的具有统计学意义的特性,例如CDR3内的氢键。我们创建的数据集以及我们的模型提供TCR-肽结合可解释预测的能力,应能增加我们对分子识别的了解,并为设计新疗法铺平道路。