Jiang Yuepeng, Huo Miaozhe, Cheng Li Shuai
Department of Computer Science, City University of Hong Kong.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad086.
The adaptive immune response to foreign antigens is initiated by T-cell receptor (TCR) recognition on the antigens. Recent experimental advances have enabled the generation of a large amount of TCR data and their cognate antigenic targets, allowing machine learning models to predict the binding specificity of TCRs. In this work, we present TEINet, a deep learning framework that utilizes transfer learning to address this prediction problem. TEINet employs two separately pretrained encoders to transform TCR and epitope sequences into numerical vectors, which are subsequently fed into a fully connected neural network to predict their binding specificities. A major challenge for binding specificity prediction is the lack of a unified approach to sampling negative data. Here, we first assess the current negative sampling approaches comprehensively and suggest that the Unified Epitope is the most suitable one. Subsequently, we compare TEINet with three baseline methods and observe that TEINet achieves an average AUROC of 0.760, which outperforms baseline methods by 6.4-26%. Furthermore, we investigate the impacts of the pretraining step and notice that excessive pretraining may lower its transferability to the final prediction task. Our results and analysis show that TEINet can make an accurate prediction using only the TCR sequence (CDR3$\beta $) and the epitope sequence, providing novel insights to understand the interactions between TCRs and epitopes.
对外源抗原的适应性免疫反应由T细胞受体(TCR)对抗原的识别引发。最近的实验进展使得能够生成大量的TCR数据及其同源抗原靶点,这使得机器学习模型能够预测TCR的结合特异性。在这项工作中,我们提出了TEINet,这是一个利用迁移学习来解决这一预测问题的深度学习框架。TEINet采用两个分别预训练的编码器,将TCR和表位序列转换为数值向量,随后将这些向量输入到一个全连接神经网络中,以预测它们的结合特异性。结合特异性预测的一个主要挑战是缺乏统一的负数据采样方法。在这里,我们首先全面评估当前的负采样方法,并表明统一表位是最合适的方法。随后,我们将TEINet与三种基线方法进行比较,发现TEINet的平均曲线下面积(AUROC)为0.760,比基线方法高出6.4%-26%。此外,我们研究了预训练步骤的影响,并注意到过度预训练可能会降低其对最终预测任务的可迁移性。我们的结果和分析表明,TEINet仅使用TCR序列(CDR3β)和表位序列就能做出准确的预测,为理解TCR与表位之间的相互作用提供了新的见解。