Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS).
Department of Computer Science and Mathematics, ADREM Data Lab.
Bioinformatics. 2019 May 1;35(9):1461-1468. doi: 10.1093/bioinformatics/bty821.
The T-cell receptor (TCR) is responsible for recognizing epitopes presented on cell surfaces. Linking TCR sequences to their ability to target specific epitopes is currently an unsolved problem, yet one of great interest. Indeed, it is currently unknown how dissimilar TCR sequences can be before they no longer bind the same epitope. This question is confounded by the fact that there are many ways to define the similarity between two TCR sequences. Here we investigate both issues in the context of TCR sequence unsupervised clustering.
We provide an overview of the performance of various distance metrics on two large independent datasets with 412 and 2835 TCR sequences respectively. Our results confirm the presence of structural distinct TCR groups that target identical epitopes. In addition, we put forward several recommendations to perform unsupervised T-cell receptor sequence clustering.
Source code implemented in Python 3 available at https://github.com/pmeysman/TCRclusteringPaper.
Supplementary data are available at Bioinformatics online.
T 细胞受体 (TCR) 负责识别细胞表面呈现的表位。将 TCR 序列与其靶向特定表位的能力联系起来是目前尚未解决但非常令人感兴趣的问题。事实上,目前尚不清楚 TCR 序列在不再结合相同表位之前可以有多大的差异。这个问题的复杂性在于,有许多方法可以定义两个 TCR 序列之间的相似性。在这里,我们在 TCR 序列无监督聚类的背景下研究了这两个问题。
我们在分别包含 412 和 2835 个 TCR 序列的两个大型独立数据集上,对各种距离度量的性能进行了概述。我们的结果证实了存在针对相同表位的结构不同 TCR 群体。此外,我们提出了一些建议来执行无监督 T 细胞受体序列聚类。
在 Python 3 中实现的源代码可在 https://github.com/pmeysman/TCRclusteringPaper 上获得。
补充数据可在生物信息学在线获得。