Chernyshev Mark, Stålmarck Aron, Corcoran Martin, Hedestam Gunilla B Karlsson, Murrell Ben
Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.
bioRxiv. 2025 Feb 26:2025.02.21.638809. doi: 10.1101/2025.02.21.638809.
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) has emerged as a central approach for studying T cell and B cell receptor populations, and is now an important component of studies of autoimmunity, immune responses to pathogens, vaccines, allergens, and cancers, and for antibody discovery. When amplifying the rearranged V(D)J genes encoding antigen receptors, each cycle of the Polymerase Chain Reaction (PCR) can produce spurious "chimeric" hybrids of two or more different template sequences. While the generation of chimeras is well understood in bacterial and viral sequencing, and there are dedicated tools to detect such sequences in bacterial and viral datasets, this is not the case for AIRR-seq. Further, the process that results in immune receptor sequences has domain-specific challenges, such as somatic hypermutation (SHM), and domain-specific opportunities, such as relatively well-known germline gene "reference" sequences. Here we describe CHMMAIRRa, a hidden Markov model for detecting chimeric sequences in AIRR-seq data, that specifically models SHM and incorporates germline reference sequences. We use simulations to characterize the performance of CHMMAIRRa and compare it to existing methods from other domains, we test the effect of PCR conditions on chimerism using IgM libraries generated in this study, and we apply CHMMAIRRa to four published AIRR-seq datasets to show the extent and impact of artifactual chimerism.
适应性免疫受体组库测序(AIRR-seq)已成为研究T细胞和B细胞受体群体的核心方法,如今是自身免疫性疾病、对病原体、疫苗、过敏原和癌症的免疫反应研究以及抗体发现研究的重要组成部分。在扩增编码抗原受体的重排V(D)J基因时,聚合酶链反应(PCR)的每个循环都可能产生两个或更多不同模板序列的虚假“嵌合”杂种。虽然在细菌和病毒测序中对嵌合体的产生有很好的理解,并且有专门的工具来检测细菌和病毒数据集中的此类序列,但AIRR-seq并非如此。此外,导致免疫受体序列的过程存在特定领域的挑战,如体细胞超突变(SHM),以及特定领域的机会,如相对知名的种系基因“参考”序列。在这里,我们描述了CHMMAIRRa,一种用于检测AIRR-seq数据中嵌合序列的隐马尔可夫模型,该模型专门对SHM进行建模并纳入种系参考序列。我们使用模拟来表征CHMMAIRRa的性能,并将其与其他领域的现有方法进行比较,我们使用本研究中生成的IgM文库测试PCR条件对嵌合现象的影响,并且我们将CHMMAIRRa应用于四个已发表的AIRR-seq数据集,以展示人为嵌合现象的程度和影响。