Hudson Dan, Lubbock Alex, Basham Mark, Koohy Hashem
MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
The Rosalind Franklin Institute, Didcot, UK.
Immunoinformatics (Amst). 2024 Mar;13:None. doi: 10.1016/j.immuno.2024.100033.
The vast potential sequence diversity of TCRs and their ligands has presented an historic barrier to computational prediction of TCR epitope specificity, a holy grail of quantitative immunology. One common approach is to cluster sequences together, on the assumption that similar receptors bind similar epitopes. Here, we provide the first independent evaluation of widely used clustering algorithms for TCR specificity inference, observing some variability in predictive performance between models, and marked differences in scalability. Despite these differences, we find that different algorithms produce clusters with high degrees of similarity for receptors recognising the same epitope. Our analysis strengthens the case for use of clustering models to identify signals of common specificity from large repertoires, whilst highlighting scope for improvement of complex models over simple comparators.
TCR及其配体巨大的潜在序列多样性一直是计算预测TCR表位特异性的历史性障碍,而TCR表位特异性预测是定量免疫学的圣杯。一种常见方法是将序列聚类在一起,前提是相似的受体结合相似的表位。在这里,我们首次对广泛用于TCR特异性推断的聚类算法进行了独立评估,观察到不同模型之间的预测性能存在一些差异,以及在可扩展性方面存在显著差异。尽管存在这些差异,但我们发现不同算法针对识别相同表位的受体产生的聚类具有高度相似性。我们的分析进一步证明了使用聚类模型从大量库中识别共同特异性信号的合理性,同时也突出了复杂模型相对于简单比较器的改进空间。