Mölder Felix, Stervbo Ulrik, Loyal Lucie, Bacher Petra, Babel Nina, Rahmann Sven
Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, 45147 Essen, Germany.
Institute of Pathology, University of Duisburg-Essen, 45147 Essen, Germany.
Bioinformatics. 2021 Oct 25;37(20):3444-3448. doi: 10.1093/bioinformatics/btab361.
Clustering T-cell receptor repertoire (TCRR) sequences according to antigen specificity is challenging. The previously published tool GLIPH needs several days to weeks for clustering large repertoires, making its use impractical in larger studies. In addition, the methodology used in GLIPH suffers from shortcomings, including non-determinism, potential loss of significant antigen-specific sequences or inclusion of too many unspecific sequences.
We present an algorithm for clustering TCRR sequences that scales efficiently to large repertoires. We clustered 36 real datasets with up to 62 000 unique CDR3β sequences using both an implementation of our method called ting, GLIPH and its successor GLIPH2. While GLIPH required multiple weeks, ting only needed about one minute for the same task. GLIPH2 is comparably fast, but uses a different grouping paradigm. In addition, we found that in naïve repertoires, where no or very few antigen-specific CDR3 sequences or clusters should exist, our method indeed selects much fewer motifs and produces smaller clusters.
Our method has been implemented in Python as a tool called ting. It is available from GitHub (https://github.com/FelixMoelder/ting) or PyPI under the MIT license.
Supplementary data are available at Bioinformatics online.
根据抗原特异性对T细胞受体库(TCRR)序列进行聚类具有挑战性。先前发布的工具GLIPH对大型库进行聚类需要数天到数周的时间,这使得它在更大规模的研究中不实用。此外,GLIPH中使用的方法存在缺陷,包括不确定性、可能丢失重要的抗原特异性序列或包含过多非特异性序列。
我们提出了一种对TCRR序列进行聚类的算法,该算法能够有效地扩展到大型库。我们使用我们称为ting的方法实现、GLIPH及其后续版本GLIPH2对36个真实数据集进行了聚类,这些数据集包含多达62000个独特的CDR3β序列。GLIPH需要数周时间,而ting完成相同任务仅需约一分钟。GLIPH2速度相当,但使用不同的分组范式。此外,我们发现,在未成熟库中,应该不存在或只有很少的抗原特异性CDR3序列或簇,我们的方法确实选择了少得多的基序并产生了更小的簇。
我们的方法已用Python实现为一个名为ting的工具。它可从GitHub(https://github.com/FelixMoelder/ting)或PyPI获得,遵循MIT许可。
补充数据可在《生物信息学》在线获取。