免疫球蛋白序列比对工具的无偏比较。

An unbiased comparison of immunoglobulin sequence aligners.

机构信息

Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel.

Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae556.

DOI:10.1093/bib/bbae556

PMID:39489605

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11531861/

Abstract

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is critical for our understanding of the adaptive immune system's dynamics in health and disease. Reliable analysis of AIRR-seq data depends on accurate rearranged immunoglobulin (Ig) sequence alignment. Various Ig sequence aligners exist, but there is no unified benchmarking standard representing the complexities of AIRR-seq data, obscuring objective comparisons of aligners across tasks. Here, we introduce GenAIRR, a modular simulation framework for generating Ig sequences alongside their ground truths. GenAIRR realistically simulates the intricacies of V(D)J recombination, somatic hypermutation, and an array of sequence corruptions. We comprehensively assessed prominent Ig sequence aligners across various metrics, unveiling unique performance characteristics for each aligner. The GenAIRR-produced datasets, combined with the proposed rigorous evaluation criteria, establish a solid basis for unbiased benchmarking of immunogenetics computational tools. It sets up the ground for further improving the crucial task of Ig sequence alignment, ultimately enhancing our understanding of adaptive immunity.

摘要

适应性免疫受体库测序 (AIRR-seq) 对于我们理解健康和疾病状态下适应性免疫系统的动态至关重要。可靠的 AIRR-seq 数据分析依赖于准确的重排免疫球蛋白 (Ig) 序列比对。目前存在各种 Ig 序列比对器，但缺乏统一的基准测试标准来代表 AIRR-seq 数据的复杂性，从而使得难以对不同任务的比对器进行客观比较。在这里，我们引入了 GenAIRR，这是一个用于生成 Ig 序列及其真实序列的模块化模拟框架。GenAIRR 真实地模拟了 V(D)J 重组、体细胞超突变和一系列序列错误的复杂性。我们使用各种指标全面评估了主要的 Ig 序列比对器，揭示了每个比对器的独特性能特征。GenAIRR 生成的数据集以及提出的严格评估标准为免疫遗传学计算工具的无偏基准测试奠定了坚实的基础。这为进一步改进 Ig 序列比对这一关键任务奠定了基础，最终增强我们对适应性免疫的理解。