Department of Biosystems Science and Engineering, ETH Zurich, Schanzenstrasse 44, 4056 Basel, Switzerland.
Life Science Zurich Graduate School, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae375.
Effective clustering of T-cell receptor (TCR) sequences could be used to predict their antigen-specificities. TCRs with highly dissimilar sequences can bind to the same antigen, thus making their clustering into a common antigen group a central challenge. Here, we develop TouCAN, a method that relies on contrastive learning and pretrained protein language models to perform TCR sequence clustering and antigen-specificity predictions. Following training, TouCAN demonstrates the ability to cluster highly dissimilar TCRs into common antigen groups. Additionally, TouCAN demonstrates TCR clustering performance and antigen-specificity predictions comparable to other leading methods in the field.
有效的 T 细胞受体 (TCR) 序列聚类可用于预测其抗原特异性。具有高度不同序列的 TCR 可以结合到相同的抗原上,因此将它们聚类到共同的抗原群中是一个核心挑战。在这里,我们开发了 TouCAN,这是一种依赖于对比学习和预先训练的蛋白质语言模型来执行 TCR 序列聚类和抗原特异性预测的方法。在训练后,TouCAN 能够将高度不同的 TCR 聚类到常见的抗原群中。此外,TouCAN 还展示了 TCR 聚类性能和抗原特异性预测能力,可与该领域的其他领先方法相媲美。