Biological Sciences, Simon Fraser University, Burnaby, BC, Canada.
Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA.
Methods Mol Biol. 2022;2453:447-476. doi: 10.1007/978-1-0716-2115-8_23.
High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR ) has revolutionized the ability to study the adaptive immune response via large-scale experiments. Since 2009, AIRR sequencing (AIRR-seq) has been widely applied to survey the immune state of individuals (see "The AIRR Community Guide to Repertoire Analysis" chapter for details). One of the goals of the AIRR Community is to make the resulting AIRR-seq data FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. Sci Data 3:1-9, 2016), with a primary goal of making it easy for the research community to reuse AIRR-seq data (Breden et al. Front Immunol 8:1418, 2017; Scott and Breden. Curr Opin Syst Biol 24:71-77, 2020). The basis for this is the MiAIRR data standard (Rubelt et al. Nat Immunol 18:1274-1278, 2017). For long-term preservation, it is recommended that researchers store their sequence read data in an INSDC repository. At the same time, the AIRR Community has established the AIRR Data Commons (Christley et al. Front Big Data 3:22, 2020), a distributed set of AIRR-compliant repositories that store the critically important annotated AIRR-seq data based on the MiAIRR standard, making the data findable, interoperable, and, because the data are annotated, more valuable in its reuse. Here, we build on the other AIRR Community chapters and illustrate how these principles and standards can be incorporated into AIRR-seq data analysis workflows. We discuss the importance of careful curation of metadata to ensure reproducibility and facilitate data sharing and reuse, and we illustrate how data can be shared via the AIRR Data Commons.
高通量测序适应性免疫受体库(AIRR,即 IG 和 TR)的出现彻底改变了通过大规模实验研究适应性免疫反应的能力。自 2009 年以来,AIRR 测序(AIRR-seq)已被广泛应用于个体免疫状态的调查(详情请参见“适应性免疫受体库分析社区指南”章节)。AIRR 社区的目标之一是使生成的 AIRR-seq 数据具有 FAIR(可发现、可访问、可互操作和可重用)特性(Wilkinson 等人,《科学数据》3:1-9, 2016),主要目标是使研究社区能够轻松重用 AIRR-seq 数据(Breden 等人,《免疫学前沿》8:1418, 2017;Scott 和 Breden,《当代系统生物学评论》24:71-77, 2020)。这一目标的基础是 MiAIRR 数据标准(Rubelt 等人,《自然免疫学》18:1274-1278, 2017)。为了长期保存,建议研究人员将其序列读取数据存储在 INSDC 存储库中。同时,AIRR 社区已经建立了 AIRR 数据公共服务(Christley 等人,《大数据前沿》3:22, 2020),这是一组分布式的符合 AIRR 标准的存储库,基于 MiAIRR 标准存储了重要的注释 AIRR-seq 数据,从而使数据可发现、可互操作,并且由于数据已注释,在重复使用时更具价值。在这里,我们基于 AIRR 社区的其他章节,说明了如何将这些原则和标准纳入 AIRR-seq 数据分析工作流程。我们讨论了精心管理元数据以确保可重复性和促进数据共享和重用的重要性,并说明了如何通过 AIRR 数据公共服务共享数据。