数据共享和再利用：AIRR 社区的方法。

Data Sharing and Reuse: A Method by the AIRR Community.

机构信息

Biological Sciences, Simon Fraser University, Burnaby, BC, Canada.

Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA.

出版信息

Methods Mol Biol. 2022;2453:447-476. doi: 10.1007/978-1-0716-2115-8_23.

DOI:10.1007/978-1-0716-2115-8_23

PMID:35622339

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9761493/

Abstract

High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR ) has revolutionized the ability to study the adaptive immune response via large-scale experiments. Since 2009, AIRR sequencing (AIRR-seq) has been widely applied to survey the immune state of individuals (see "The AIRR Community Guide to Repertoire Analysis" chapter for details). One of the goals of the AIRR Community is to make the resulting AIRR-seq data FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. Sci Data 3:1-9, 2016), with a primary goal of making it easy for the research community to reuse AIRR-seq data (Breden et al. Front Immunol 8:1418, 2017; Scott and Breden. Curr Opin Syst Biol 24:71-77, 2020). The basis for this is the MiAIRR data standard (Rubelt et al. Nat Immunol 18:1274-1278, 2017). For long-term preservation, it is recommended that researchers store their sequence read data in an INSDC repository. At the same time, the AIRR Community has established the AIRR Data Commons (Christley et al. Front Big Data 3:22, 2020), a distributed set of AIRR-compliant repositories that store the critically important annotated AIRR-seq data based on the MiAIRR standard, making the data findable, interoperable, and, because the data are annotated, more valuable in its reuse. Here, we build on the other AIRR Community chapters and illustrate how these principles and standards can be incorporated into AIRR-seq data analysis workflows. We discuss the importance of careful curation of metadata to ensure reproducibility and facilitate data sharing and reuse, and we illustrate how data can be shared via the AIRR Data Commons.

摘要

高通量测序适应性免疫受体库（AIRR，即 IG 和 TR）的出现彻底改变了通过大规模实验研究适应性免疫反应的能力。自 2009 年以来，AIRR 测序（AIRR-seq）已被广泛应用于个体免疫状态的调查（详情请参见“适应性免疫受体库分析社区指南”章节）。AIRR 社区的目标之一是使生成的 AIRR-seq 数据具有 FAIR（可发现、可访问、可互操作和可重用）特性（Wilkinson 等人，《科学数据》3:1-9, 2016），主要目标是使研究社区能够轻松重用 AIRR-seq 数据（Breden 等人，《免疫学前沿》8:1418, 2017；Scott 和 Breden，《当代系统生物学评论》24:71-77, 2020）。这一目标的基础是 MiAIRR 数据标准（Rubelt 等人，《自然免疫学》18:1274-1278, 2017）。为了长期保存，建议研究人员将其序列读取数据存储在 INSDC 存储库中。同时，AIRR 社区已经建立了 AIRR 数据公共服务（Christley 等人，《大数据前沿》3:22, 2020），这是一组分布式的符合 AIRR 标准的存储库，基于 MiAIRR 标准存储了重要的注释 AIRR-seq 数据，从而使数据可发现、可互操作，并且由于数据已注释，在重复使用时更具价值。在这里，我们基于 AIRR 社区的其他章节，说明了如何将这些原则和标准纳入 AIRR-seq 数据分析工作流程。我们讨论了精心管理元数据以确保可重复性和促进数据共享和重用的重要性，并说明了如何通过 AIRR 数据公共服务共享数据。

相似文献

Data Sharing and Reuse: A Method by the AIRR Community.数据共享和再利用：AIRR 社区的方法。

Methods Mol Biol. 2022;2453:447-476. doi: 10.1007/978-1-0716-2115-8_23.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.CAIRR 管道用于向国家生物技术信息中心存储库提交符合标准的 B 和 T 细胞受体文库测序研究。

Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.

iReceptor: A platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories.iReceptor：一个用于查询和分析抗体/B 细胞和 T 细胞受体库数据的平台，可跨联合存储库进行分析。

Immunol Rev. 2018 Jul;284(1):24-41. doi: 10.1111/imr.12666.

The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons.ADC应用程序编程接口：用于对AIRR数据共享库进行编程式查询的Web应用程序编程接口。

Front Big Data. 2020 Jun 17;3:22. doi: 10.3389/fdata.2020.00022. eCollection 2020.

AIRR Community Standardized Representations for Annotated Immune Repertoires.AIRR 社区注释免疫受体的标准化表示。

Front Immunol. 2018 Sep 28;9:2206. doi: 10.3389/fimmu.2018.02206. eCollection 2018.

Adaptive Immune Receptor Repertoire (AIRR) Community Guide to TR and IG Gene Annotation.适应性免疫受体库（AIRR）TR 和 IG 基因注释社区指南。

Methods Mol Biol. 2022;2453:279-296. doi: 10.1007/978-1-0716-2115-8_16.

The adaptive immune receptor repertoire community as a model for FAIR stewardship of big immunology data.适应性免疫受体库群落作为大型免疫学数据的公平信息和知识管理模型。

Curr Opin Syst Biol. 2020 Dec;24:71-77. doi: 10.1016/j.coisb.2020.10.001. Epub 2020 Oct 10.

AIRR Community Guide to Planning and Performing AIRR-Seq Experiments.AIRR 社区 AIRR-Seq 实验规划和执行指南

Methods Mol Biol. 2022;2453:261-278. doi: 10.1007/978-1-0716-2115-8_15.

nf-core/airrflow: An adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.nf-core/airrflow：采用 Immcantation 框架的适应性免疫受体库分析工作流程。

PLoS Comput Biol. 2024 Jul 26;20(7):e1012265. doi: 10.1371/journal.pcbi.1012265. eCollection 2024 Jul.

Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data.适应性免疫受体库数据的可重复性与再利用

Front Immunol. 2017 Nov 1;8:1418. doi: 10.3389/fimmu.2017.01418. eCollection 2017.

引用本文的文献

The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus data repository.对基因表达综合数据库（Gene Expression Omnibus data repository）中组学研究附带的公共元数据完整性的系统评估。

Genome Biol. 2025 Sep 9;26(1):274. doi: 10.1186/s13059-025-03725-0.

The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus.对基因表达综合数据库中组学研究相关公共元数据完整性的系统评估。

bioRxiv. 2025 Jul 7:2021.11.22.469640. doi: 10.1101/2021.11.22.469640.

Leveraging artificial intelligence and machine learning to accelerate discovery of disease-modifying therapies in type 1 diabetes.利用人工智能和机器学习加速1型糖尿病疾病修饰疗法的发现。

Diabetologia. 2025 Mar;68(3):477-494. doi: 10.1007/s00125-024-06339-6. Epub 2024 Dec 19.

The Type 1 Diabetes T Cell Receptor and B Cell Receptor Repository in the AIRR Data Commons: a practical guide for access, use and contributions through the Type 1 Diabetes AIRR Consortium.AIRR数据共享库中的1型糖尿病T细胞受体和B细胞受体资源库：通过1型糖尿病AIRR联盟进行访问、使用和贡献的实用指南。

Diabetologia. 2025 Jan;68(1):186-202. doi: 10.1007/s00125-024-06298-y. Epub 2024 Oct 29.

本文引用的文献

The Future of Blood Testing Is the Immunome.免疫组学：血液检测的未来。

Front Immunol. 2021 Mar 15;12:626793. doi: 10.3389/fimmu.2021.626793. eCollection 2021.

The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons.ADC应用程序编程接口：用于对AIRR数据共享库进行编程式查询的Web应用程序编程接口。

Front Big Data. 2020 Jun 17;3:22. doi: 10.3389/fdata.2020.00022. eCollection 2020.

Neurological Manifestations of COVID-19 Feature T Cell Exhaustion and Dedifferentiated Monocytes in Cerebrospinal Fluid.新型冠状病毒肺炎的神经学表现特征为脑脊液中 T 细胞耗竭和未分化单核细胞。

Immunity. 2021 Jan 12;54(1):164-175.e6. doi: 10.1016/j.immuni.2020.12.011. Epub 2020 Dec 23.

Curr Opin Syst Biol. 2020 Dec;24:71-77. doi: 10.1016/j.coisb.2020.10.001. Epub 2020 Oct 10.

TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function.TCRdb：一个带有强大搜索功能的 T 细胞受体序列综合数据库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D468-D474. doi: 10.1093/nar/gkaa796.

Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences.免疫信息学工具在抗体库序列分析中的基准测试。

Bioinformatics. 2020 Mar 1;36(6):1731-1739. doi: 10.1093/bioinformatics/btz845.

Tools for fundamental analysis functions of TCR repertoires: a systematic comparison.用于 TCR 库基础分析功能的工具：系统比较。

Brief Bioinform. 2020 Sep 25;21(5):1706-1716. doi: 10.1093/bib/bbz092.

PIRD: Pan Immune Repertoire Database.PIRD：全免疫受体数据库。

Bioinformatics. 2020 Feb 1;36(3):897-903. doi: 10.1093/bioinformatics/btz614.

Standardized next-generation sequencing of immunoglobulin and T-cell receptor gene recombinations for MRD marker identification in acute lymphoblastic leukaemia; a EuroClonality-NGS validation study.采用标准化的下一代免疫球蛋白和 T 细胞受体基因重排测序技术鉴定急性淋巴细胞白血病微小残留病（MRD）标志物：一项 EuroClonality-NGS 验证研究。

Leukemia. 2019 Sep;33(9):2241-2253. doi: 10.1038/s41375-019-0496-7. Epub 2019 Jun 26.

The Pipeline Repertoire for Ig-Seq Analysis.免疫球蛋白重链可变区测序分析的管道资源库。

Front Immunol. 2019 Apr 30;10:899. doi: 10.3389/fimmu.2019.00899. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验