基于高性能计算的免疫受体谱分析：AIRR 社区的一种方法。

Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community.

机构信息

Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA.

Center for Translational Medicine, Immunology, and Transplantation, Immundiagnostik, Marien Hospital Herne, University Hospital of the Ruhr-University Bochum, Herne, Germany.

出版信息

Methods Mol Biol. 2022;2453:439-446. doi: 10.1007/978-1-0716-2115-8_22.

DOI:10.1007/978-1-0716-2115-8_22

PMID:35622338

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9761903/

Abstract

AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20-30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3-5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required.VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer's high-performance computing.

摘要

AIRR-seq 数据集通常较大，需要专门的分析方法和软件工具。典型的 Illumina MiSeq 测序运行生成 20-3000 万个 2×300bp 配对末端序列读取，大致对应 15GB 待处理的序列数据。其他平台，如 NextSeq，在不需要完整 V 基因的项目中很有用，生成约 4 亿个 2×150bp 配对末端读取。由于数据集的大小，分析可能在计算上很昂贵，特别是像预处理和基因注释这样的早期分析步骤，这些步骤处理了大部分序列数据。单个 MiSeq 运行可能需要标准桌面 PC 持续处理 3-5 天，因此可能需要专用的高性能计算资源。VDJServer 通过图形用户界面（Christley 等人，Front Immunol 9:976, 2018）在德克萨斯高级计算中心（TACC）提供对高性能计算（HPC）的免费访问。VDJServer 是一个基于云的免疫受体序列数据分析门户，提供了一整套工具套件，用于完成分析工作流程，包括用于序列读取的预处理和质量控制、V(D)J 基因分配、受体特征描述和受体比较的模块。此外，VDJServer 为 IgBLAST 等工具提供了并行执行，因此随着输入数据的大小增长，可以利用更多的计算资源。在桌面 PC 上需要数天的分析可能只需要在 VDJServer 上几个小时。VDJServer 是一个免费的、公开可用的、开源许可的资源。在这里，我们描述了在 VDJServer 的高性能计算上执行免疫受体分析的工作流程。

相似文献

Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community.基于高性能计算的免疫受体谱分析：AIRR 社区的一种方法。

Methods Mol Biol. 2022;2453:439-446. doi: 10.1007/978-1-0716-2115-8_22.

VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.VDJServer：一个基于云的免疫受体序列和重排分析门户和数据公共库。

Front Immunol. 2018 May 8;9:976. doi: 10.3389/fimmu.2018.00976. eCollection 2018.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.CAIRR 管道用于向国家生物技术信息中心存储库提交符合标准的 B 和 T 细胞受体文库测序研究。

Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.

nf-core/airrflow: An adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.nf-core/airrflow：采用 Immcantation 框架的适应性免疫受体库分析工作流程。

PLoS Comput Biol. 2024 Jul 26;20(7):e1012265. doi: 10.1371/journal.pcbi.1012265. eCollection 2024 Jul.

Accelerating single molecule localization microscopy through parallel processing on a high-performance computing cluster.通过在高性能计算集群上进行并行处理来加速单分子定位显微镜技术。

J Microsc. 2019 Feb;273(2):148-160. doi: 10.1111/jmi.12772. Epub 2018 Dec 3.

Data Sharing and Reuse: A Method by the AIRR Community.数据共享和再利用：AIRR 社区的方法。

Methods Mol Biol. 2022;2453:447-476. doi: 10.1007/978-1-0716-2115-8_23.

JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.JMS：一个用于高性能计算的开源工作流管理系统和基于网络的集群前端。

PLoS One. 2015 Aug 17;10(8):e0134273. doi: 10.1371/journal.pone.0134273. eCollection 2015.

ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community.ELIXIR-IT HPC@CINECA：生物信息学社区的高性能计算资源。

BMC Bioinformatics. 2020 Aug 21;21(Suppl 10):352. doi: 10.1186/s12859-020-03565-8.

A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset.一种用于加速在25个基因组数据集上进行GATK单核苷酸多态性检测的高性能计算工作流程。

BMC Biol. 2024 Jan 25;22(1):13. doi: 10.1186/s12915-024-01820-5.

VDJML: a file format with tools for capturing the results of inferring immune receptor rearrangements.VDJML：一种带有用于捕获免疫受体重排推断结果工具的文件格式。

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):333. doi: 10.1186/s12859-016-1214-3.

本文引用的文献

The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons.ADC应用程序编程接口：用于对AIRR数据共享库进行编程式查询的Web应用程序编程接口。

Front Big Data. 2020 Jun 17;3:22. doi: 10.3389/fdata.2020.00022. eCollection 2020.

Mapping the immunogenic landscape of near-native HIV-1 envelope trimers in non-human primates.描绘非人类灵长类动物中接近天然 HIV-1 包膜三聚体的免疫原性景观。

PLoS Pathog. 2020 Aug 31;16(8):e1008753. doi: 10.1371/journal.ppat.1008753. eCollection 2020 Aug.

AIRR Community Standardized Representations for Annotated Immune Repertoires.AIRR 社区注释免疫受体的标准化表示。

Front Immunol. 2018 Sep 28;9:2206. doi: 10.3389/fimmu.2018.02206. eCollection 2018.

VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.VDJServer：一个基于云的免疫受体序列和重排分析门户和数据公共库。

Front Immunol. 2018 May 8;9:976. doi: 10.3389/fimmu.2018.00976. eCollection 2018.

VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data.VDJPipe：一种用于预处理免疫组库测序数据的流水线工具。

BMC Bioinformatics. 2017 Oct 11;18(1):448. doi: 10.1186/s12859-017-1853-z.

VDJML: a file format with tools for capturing the results of inferring immune receptor rearrangements.VDJML：一种带有用于捕获免疫受体重排推断结果工具的文件格式。

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):333. doi: 10.1186/s12859-016-1214-3.

Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.Change-O：一个用于分析大规模B细胞免疫球蛋白库测序数据的工具包。

Bioinformatics. 2015 Oct 15;31(20):3356-8. doi: 10.1093/bioinformatics/btv359. Epub 2015 Jun 10.

pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires.pRESTO：一种用于处理淋巴细胞受体组高通量测序原始读数的工具包。

Bioinformatics. 2014 Jul 1;30(13):1930-2. doi: 10.1093/bioinformatics/btu138. Epub 2014 Mar 10.

IMGT/V-QUEST: IMGT standardized analysis of the immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences.IMGT/V-QUEST：免疫球蛋白（IG）和T细胞受体（TR）核苷酸序列的IMGT标准化分析。

Cold Spring Harb Protoc. 2011 Jun 1;2011(6):695-715. doi: 10.1101/pdb.prot5633.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验