Suppr超能文献

CompAIRR:通过精确和近似序列匹配进行适应性免疫受体库的超快速比较。

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching.

机构信息

Department of Informatics, University of Oslo, 0316 Oslo, Norway.

Department of Microbiology, Oslo University Hospital, 0424 Oslo, Norway.

出版信息

Bioinformatics. 2022 Sep 2;38(17):4230-4232. doi: 10.1093/bioinformatics/btac505.

Abstract

MOTIVATION

Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching.

RESULTS

CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 104 AIRRs with 105 sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications.

AVAILABILITY AND IMPLEMENTATION

CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

适应性免疫受体 (AIR) 库 (AIRRs) 以极高的特异性记录过去的免疫接触。因此,在个体之间识别相同或相似的 AIR 序列是 AIRR 分析的关键步骤,可揭示可能用于诊断和治疗的趋同免疫反应模式。现有的用于量化 AIRR 重叠的方法在数据集数量和大小增加时扩展效果不佳。为了解决这个限制,我们开发了 CompAIRR,它基于精确或近似的序列匹配,实现了超快速的 AIRR 重叠计算。

结果

CompAIRR 相对于现有技术将计算速度提高了 1000 倍,并且仅使用其三分之一的内存:在同一台机器上,对 104 个具有 105 个序列的 AIRR 进行精确的两两 AIRR 重叠,大约需要 17 分钟,而最快的替代工具则需要 10 天。CompAIRR 已与机器学习生态系统 immuneML 集成,以加速常用的基于 AIRR 的机器学习应用。

可用性和实现

CompAIRR 代码和文档可在 https://github.com/uio-bmi/compairr 上获得。Docker 镜像可在 https://hub.docker.com/r/torognes/compairr 上获得。可在 https://github.com/uio-bmi/compairr-benchmarking 上获得复制合成数据集的代码、用于基准测试和创建图的脚本以及所有图背后的原始数据。

补充信息

补充数据可在《生物信息学》在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/081c/9438946/fa0cc507f9f6/btac505f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验