Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, United States.
Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States.
Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.
The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
高通量测序技术在 B 细胞受体和 T 细胞受体上的应用使得对适应性免疫受体库(AIRR)进行前所未有的深度特征描述成为可能。这些适应性免疫受体测序(AIRR-seq)研究为深入了解疫苗学、传染病、自身免疫和癌症中的适应性免疫反应提供了巨大的潜力。AIRR-seq 的广泛应用导致越来越多的研究结果存入公共领域,通过二次分析和荟萃分析有可能产生新的科学见解。然而,有效地共享这些大规模数据仍然是一个挑战。AIRR 社区提出了适应性免疫受体库的最小信息(MiAIRR),这是报告 AIRR-seq 研究的标准。MiAIRR 标准已经通过国家生物技术信息中心(NCBI)的存储库实现了操作化。向 NCBI 存储库提交 AIRR-seq 数据通常结合使用基于网络的和平面文件模板,并且只包含最小量的术语验证。因此,NCBI 中的 AIRR-seq 研究通常使用不一致的术语进行描述,限制了科学家访问、查找、互操作和重用数据集的能力。为了提高元数据质量并简化向 NCBI 提交 AIRR-seq 研究,我们利用了由扩展数据注释和检索中心(CEDAR)开发的软件框架,该框架开发了涉及使用数据标准和本体论来提高元数据质量的技术。由此产生的 CEDAR-AIRR(CAIRR)管道使数据提交者能够:(i)创建基于网络的模板,其条目由本体术语控制;(ii)生成和验证元数据;以及(iii)将本体链接的元数据和序列文件(FASTQ)提交到 NCBI BioProject、BioSample 和 Sequence Read Archive 数据库。总体而言,CAIRR 提供了一个基于网络的元数据提交界面,支持符合 MiAIRR 标准。该管道可在 http://cairr.miairr.org 上获得,并将促进 NCBI 的提交过程并提高 AIRR-seq 研究的元数据质量。