CompAIRR：通过精确和近似序列匹配进行适应性免疫受体库的超快速比较。

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching.

机构信息

Department of Informatics, University of Oslo, 0316 Oslo, Norway.

Department of Microbiology, Oslo University Hospital, 0424 Oslo, Norway.

出版信息

Bioinformatics. 2022 Sep 2;38(17):4230-4232. doi: 10.1093/bioinformatics/btac505.

DOI:10.1093/bioinformatics/btac505

PMID:35852318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9438946/

Abstract

MOTIVATION

Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching.

RESULTS

CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 104 AIRRs with 105 sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications.

AVAILABILITY AND IMPLEMENTATION

CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

适应性免疫受体 (AIR) 库 (AIRRs) 以极高的特异性记录过去的免疫接触。因此，在个体之间识别相同或相似的 AIR 序列是 AIRR 分析的关键步骤，可揭示可能用于诊断和治疗的趋同免疫反应模式。现有的用于量化 AIRR 重叠的方法在数据集数量和大小增加时扩展效果不佳。为了解决这个限制，我们开发了 CompAIRR，它基于精确或近似的序列匹配，实现了超快速的 AIRR 重叠计算。

结果

CompAIRR 相对于现有技术将计算速度提高了 1000 倍，并且仅使用其三分之一的内存：在同一台机器上，对 104 个具有 105 个序列的 AIRR 进行精确的两两 AIRR 重叠，大约需要 17 分钟，而最快的替代工具则需要 10 天。CompAIRR 已与机器学习生态系统 immuneML 集成，以加速常用的基于 AIRR 的机器学习应用。

可用性和实现

CompAIRR 代码和文档可在 https://github.com/uio-bmi/compairr 上获得。Docker 镜像可在 https://hub.docker.com/r/torognes/compairr 上获得。可在 https://github.com/uio-bmi/compairr-benchmarking 上获得复制合成数据集的代码、用于基准测试和创建图的脚本以及所有图背后的原始数据。

补充信息

补充数据可在《生物信息学》在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/081c/9438946/fa0cc507f9f6/btac505f1.jpg

相似文献

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching.CompAIRR：通过精确和近似序列匹配进行适应性免疫受体库的超快速比较。

Bioinformatics. 2022 Sep 2;38(17):4230-4232. doi: 10.1093/bioinformatics/btac505.

simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods.simAIRR：具有真实受体序列共享的适应性免疫受体模拟，用于免疫状态预测方法的基准测试。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad074. Epub 2023 Oct 17.

The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.用于适应性免疫受体库机器学习分析的immuneML生态系统。

Nat Mach Intell. 2021 Nov;3(11):936-944. doi: 10.1038/s42256-021-00413-z. Epub 2021 Nov 16.

immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking.immuneSIM：用于免疫信息学基准测试的 B 细胞和 T 细胞受体库的可调多特征模拟。

Bioinformatics. 2020 Jun 1;36(11):3594-3596. doi: 10.1093/bioinformatics/btaa158.

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification.分析机器学习模型在适应性免疫受体谱系分类中的基线性能和极限。

Gigascience. 2022 May 25;11. doi: 10.1093/gigascience/giac046.

Echidna: integrated simulations of single-cell immune receptor repertoires and transcriptomes.针鼹：单细胞免疫受体库和转录组的综合模拟

Bioinform Adv. 2022 Sep 2;2(1):vbac062. doi: 10.1093/bioadv/vbac062. eCollection 2022.

GO Bench: shared hub for universal benchmarking of machine learning-based protein functional annotations.GO Bench：用于机器学习的蛋白质功能注释的通用基准测试的共享中心。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad081.

EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM.EvoLSTM：使用序列到序列 LSTM 的序列进化的上下文相关模型。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i353-i361. doi: 10.1093/bioinformatics/btaa447.

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction.OpenBioLink：大规模生物医学链接预测的基准测试框架。

Bioinformatics. 2020 Jul 1;36(13):4097-4098. doi: 10.1093/bioinformatics/btaa274.

Decombinator V4: an improved AIRR compliant-software package for T-cell receptor sequence annotation?Decombinator V4：一个改进的符合 AIRR 标准的软件包，用于 T 细胞受体序列注释？

Bioinformatics. 2021 May 5;37(6):876-878. doi: 10.1093/bioinformatics/btaa758.

引用本文的文献

Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning.利用复杂免疫信息模拟适应性免疫受体和库，以指导适应性免疫受体库（AIRR）机器学习的开发和基准测试。

Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf025.

Predictability of antigen binding based on short motifs in the antibody CDRH3.基于抗体 CDRH3 中的短基序预测抗原结合。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae537.

nf-core/airrflow: An adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.nf-core/airrflow：采用 Immcantation 框架的适应性免疫受体库分析工作流程。

PLoS Comput Biol. 2024 Jul 26;20(7):e1012265. doi: 10.1371/journal.pcbi.1012265. eCollection 2024 Jul.

Enhancing comparative T cell receptor repertoire analysis in small biological samples through pooling homologous cell samples from multiple mice.通过合并来自多只小鼠的同源细胞样本，增强小生物样本中的比较性T细胞受体库分析。

Cell Rep Methods. 2024 Apr 22;4(4):100753. doi: 10.1016/j.crmeth.2024.100753. Epub 2024 Apr 12.

T-cells specific for KSHV and HIV migrate to Kaposi sarcoma tumors and persist over time.对卡波西肉瘤相关疱疹病毒（KSHV）和人类免疫缺陷病毒（HIV）具有特异性的T细胞迁移至卡波西肉瘤肿瘤，并随时间持续存在。

bioRxiv. 2025 Feb 15:2024.02.06.579223. doi: 10.1101/2024.02.06.579223.

Systems immunology spanning tumors, lymph nodes, and periphery.系统免疫学涵盖肿瘤、淋巴结和外周组织。

Cell Rep Methods. 2023 Dec 18;3(12):100670. doi: 10.1016/j.crmeth.2023.100670. Epub 2023 Dec 11.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad074. Epub 2023 Oct 17.

KA-Search, a method for rapid and exhaustive sequence identity search of known antibodies.KA-Search，一种用于快速、全面搜索已知抗体序列同一性的方法。

Sci Rep. 2023 Jul 18;13(1):11612. doi: 10.1038/s41598-023-38108-7.

T-Cell Receptor Repertoire Sequencing in the Era of Cancer Immunotherapy.T 细胞受体谱测序在肿瘤免疫治疗时代。

Clin Cancer Res. 2023 Mar 14;29(6):994-1008. doi: 10.1158/1078-0432.CCR-22-2469.

AIRRscape: An interactive tool for exploring B-cell receptor repertoires and antibody responses.AIRRscape：用于探索 B 细胞受体库和抗体反应的交互式工具。

PLoS Comput Biol. 2022 Sep 20;18(9):e1010052. doi: 10.1371/journal.pcbi.1010052. eCollection 2022 Sep.

本文引用的文献

The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.用于适应性免疫受体库机器学习分析的immuneML生态系统。

Nat Mach Intell. 2021 Nov;3(11):936-944. doi: 10.1038/s42256-021-00413-z. Epub 2021 Nov 16.

Reference-based comparison of adaptive immune receptor repertoires.基于参考的适应性免疫受体库比较。

Cell Rep Methods. 2022 Aug 22;2(8):100269. doi: 10.1016/j.crmeth.2022.100269.

Phylogenetic analysis of migration, differentiation, and class switching in B cells.B 细胞迁移、分化和类别转换的系统发生分析。

PLoS Comput Biol. 2022 Apr 25;18(4):e1009885. doi: 10.1371/journal.pcbi.1009885. eCollection 2022 Apr.

GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation.GIANA 通过等距变换实现计算高效的 TCR 聚类和多疾病库分类。

Nat Commun. 2021 Aug 4;12(1):4699. doi: 10.1038/s41467-021-25006-7.

Swarm v3: towards tera-scale amplicon clustering.Swarm v3：迈向万亿级扩增子聚类。

Bioinformatics. 2021 Dec 22;38(1):267-269. doi: 10.1093/bioinformatics/btab493.

ClusTCR: a python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity.ClusTCR：一种用于快速聚类具有未知抗原特异性的大量 CDR3 序列的 Python 接口。

Bioinformatics. 2021 Dec 11;37(24):4865-4867. doi: 10.1093/bioinformatics/btab446.

The Future of Blood Testing Is the Immunome.免疫组学：血液检测的未来。

Front Immunol. 2021 Mar 15;12:626793. doi: 10.3389/fimmu.2021.626793. eCollection 2021.

Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences.基于聚类的方法用于鉴定与疾病相关的 T 细胞受体 β 链 CDR3 序列的群体水平。

BMC Bioinformatics. 2021 Mar 25;22(1):159. doi: 10.1186/s12859-021-04087-7.

Detecting T cell receptors involved in immune responses from single repertoire snapshots.从单个免疫库快照中检测参与免疫反应的 T 细胞受体。

PLoS Biol. 2019 Jun 13;17(6):e3000314. doi: 10.1371/journal.pbio.3000314. eCollection 2019 Jun.

T cell receptor β repertoires as novel diagnostic markers for systemic lupus erythematosus and rheumatoid arthritis.T 细胞受体 β 谱作为系统性红斑狼疮和类风湿关节炎的新型诊断标志物。

Ann Rheum Dis. 2019 Aug;78(8):1070-1078. doi: 10.1136/annrheumdis-2019-215442. Epub 2019 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CompAIRR：通过精确和近似序列匹配进行适应性免疫受体库的超快速比较。

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献