AI Lab, Tencent, Viseen Business Park, Gaoxin 9th South Road, 518057 Shenzhen, China.
School of Informatics, Xiamen University, South Siming Road 422, 361005 Xiamen, China.
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad191.
Accurately predicting the antigen-binding specificity of adaptive immune receptors (AIRs), such as T-cell receptors (TCRs) and B-cell receptors (BCRs), is essential for discovering new immune therapies. However, the diversity of AIR chain sequences limits the accuracy of current prediction methods. This study introduces SC-AIR-BERT, a pre-trained model that learns comprehensive sequence representations of paired AIR chains to improve binding specificity prediction. SC-AIR-BERT first learns the 'language' of AIR sequences through self-supervised pre-training on a large cohort of paired AIR chains from multiple single-cell resources. The model is then fine-tuned with a multilayer perceptron head for binding specificity prediction, employing the K-mer strategy to enhance sequence representation learning. Extensive experiments demonstrate the superior AUC performance of SC-AIR-BERT compared with current methods for TCR- and BCR-binding specificity prediction.
准确预测适应性免疫受体(AIRs),如 T 细胞受体(TCRs)和 B 细胞受体(BCRs)的抗原结合特异性,对于发现新的免疫疗法至关重要。然而,AIR 链序列的多样性限制了当前预测方法的准确性。本研究引入了 SC-AIR-BERT,这是一种预先训练的模型,它学习配对的 AIR 链的综合序列表示,以提高结合特异性预测的准确性。SC-AIR-BERT 首先通过在来自多个单细胞资源的大量配对的 AIR 链的大型队列上进行自我监督的预训练来学习 AIR 序列的“语言”。然后,该模型使用多层感知机头进行结合特异性预测,并采用 K-mer 策略来增强序列表示学习。广泛的实验表明,与 TCR 和 BCR 结合特异性预测的当前方法相比,SC-AIR-BERT 的 AUC 性能更优。