Gabernet Gisela, Marquez Susanna, Bjornson Robert, Peltzer Alexander, Meng Hailong, Aron Edel, Lee Noah Yann, Jensen Cole, Ladd David, Hanssen Friederike, Heumos Simon, Yaari Gur, Kowarik Markus C, Nahnsen Sven, Kleinstein Steven H
bioRxiv. 2024 Jan 28:2024.01.18.576147. doi: 10.1101/2024.01.18.576147.
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).
适应性免疫受体组库测序(AIRR-seq)是一种用于研究健康状态下以及在诸如传染病、(自身)免疫性疾病和癌症等免疫挑战后的免疫状态的重要实验工具。已经开发了多种工具来从AIRR-seq数据中重建B细胞和T细胞受体序列,并推断B细胞和T细胞的克隆关系。然而,目前可用的工具在跨样本的并行化、可扩展性或对高性能计算基础设施的可移植性方面存在限制。为满足这一需求,我们开发了nf-core/airrflow,这是一个端到端的批量和单细胞AIRR-seq处理工作流程,它遵循BCR和TCR测序数据分析的最佳实践集成了Immcantation框架。Immcantation框架是一个综合工具集,它允许从原始读段处理到克隆推断来处理批量和单细胞AIRR-seq数据。nf-core/airrflow用Nextflow编写,是nf-core项目的一部分,该项目收集了社区贡献和整理的用于各种分析任务的Nextflow工作流程。我们评估了nf-core/airrflow在带有测序错误的模拟测序数据上的性能,并展示了真实数据集的示例结果。为了证明nf-core/airrflow在大型AIRR-seq数据集高通量处理中的适用性,我们通过分析97名COVID-19感染个体和99名健康对照(包括批量和单细胞测序数据集的混合),验证并扩展了先前报道的对SARS-CoV-2的趋同抗体反应的发现。使用这个数据集,我们将趋同发现扩展到另外20名受试者,突出了nf-core/airrflow通过重新分析大型公开可用的AIRR数据集来验证小型内部队列中的发现的适用性。nf-core/airrflow可在GitHub上根据MIT许可免费获取(https://github.com/nf-core/airrflow)。详细文档和示例结果可在nf-core网站上获取(https://nf-co.re/airrflow)。