Lin Mao-Jan, Langmead Ben, Safonova Yana
Department of Computer Science, Johns Hopkins University.
Computer Science and Engineering Department, Pennsylvania State University.
bioRxiv. 2024 Jul 23:2024.07.20.604421. doi: 10.1101/2024.07.20.604421.
New high-quality human genome assemblies derived from lymphoblastoid cell lines (LCLs) provide reference genomes and pangenomes for genomics studies. However, the characteristics of LCLs pose technical challenges to profiling immunoglobulin (IG) genes. IG loci in LCLs contain a mixture of germline and somatically recombined haplotypes, making them difficult to genotype or assemble accurately. To address these challenges, we introduce IGLoo, a software tool that implements novel methods for analyzing sequence data and genome assemblies derived from LCLs. IGLoo characterizes somatic V(D)J recombination events in the sequence data and identifies the breakpoints and missing IG genes in the LCL-based assemblies. Furthermore, IGLoo implements a novel reassembly framework to improve germline assembly quality by integrating information about somatic events and population structural variantions in the IG loci. We applied IGLoo to study the assemblies from the Human Pangenome Reference Consortium, providing new insights into the mechanisms, gene usage, and patterns of V(D)J recombination, causes of assembly fragmentation in the IG heavy chain (IGH) locus, and improved representation of the IGH assemblies.
源自淋巴母细胞系(LCLs)的新型高质量人类基因组组装为基因组学研究提供了参考基因组和泛基因组。然而,LCLs的特性给免疫球蛋白(IG)基因分析带来了技术挑战。LCLs中的IG基因座包含种系和体细胞重组单倍型的混合物,使得它们难以进行准确的基因分型或组装。为应对这些挑战,我们引入了IGLoo,这是一种软件工具,它实现了用于分析源自LCLs的序列数据和基因组组装的新方法。IGLoo可表征序列数据中的体细胞V(D)J重组事件,并识别基于LCLs的组装中的断点和缺失的IG基因。此外,IGLoo实现了一种新颖的重新组装框架,通过整合IG基因座中的体细胞事件信息和群体结构变异来提高种系组装质量。我们将IGLoo应用于研究人类泛基因组参考联盟的组装,为V(D)J重组的机制、基因使用和模式、IG重链(IGH)基因座中组装片段化的原因以及IGH组装的改进表示提供了新的见解。