Research Programs Unit, Translational Immunology, University of Helsinki, Helsinki, Finland.
Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland.
BMC Bioinformatics. 2021 Mar 25;22(1):159. doi: 10.1186/s12859-021-04087-7.
Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either "public" CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking.
We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance.
We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s.
深度免疫受体测序(RepSeq)为识别和研究与疾病相关的 T 细胞克隆型(以 T 细胞受体(TCR)CDR3 序列表示)提供了前所未有的机会。然而,由于免疫库的巨大多样性,从总免疫库中鉴定与疾病相关的 TCR CDR3 主要限于“公共”CDR3 序列或单个个体中观察到的 CDR3 频率比较。目前缺乏一种通过直接对 RepSeq 样本进行群体水平比较来鉴定与疾病相关的 TCR CDR3 的方法。
我们提出了一种使用个体间共享的免疫库亚单位(或亚库)直接进行 RepSeq 样本群体水平比较的方法。该方法首先在每个样本内对 CDR3 进行无监督聚类。然后,它在样本之间找到匹配的簇,称为免疫亚库,并在鉴定的亚库水平上进行统计差异丰度测试。最后,对差异丰度亚库中的 CDR3 进行相关性排序。我们将该方法应用于乳糜泻患者的总 TCR CDR3β RepSeq 数据集,以及黄热病疫苗的公共数据集。该方法成功地鉴定了乳糜泻相关的 CDR3β 序列,这一点从检测到的 CDR3β 序列中 TRBV 基因和位置氨基酸使用模式与乳糜泻中特定于麸质的已知 CDR3β 序列之间的一致性得到了证明。它还成功地恢复了比预期更多的与每种疾病相关的已知 CDR3β 序列。
我们得出结论,不同个体之间具有相似免疫基因组特征的免疫亚库可以作为可行的免疫库比较单位,可作为鉴定与疾病相关的 CDR3 的替代方法。