European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany.
Nat Commun. 2024 Sep 13;15(1):8007. doi: 10.1038/s41467-024-52027-9.
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
现代测序技术能够系统地检测基因组中的复杂结构变异 (SV)。然而,通过一系列突变产生的广泛 DNA 重排,即我们所说的串联 SV (sSV),仍未得到充分研究,这对 SV 的发现构成了挑战。在这里,我们提出了 NAHRwhals(https://github.com/WHops/NAHRwhals),这是一种在长读长基因组组装中推断重复介导的一系列 SV 的方法。将 NAHRwhals 应用于 28 个人的单倍型解析人类基因组,揭示了 37 个具有不同长度和复杂性的 sSV 位点。这些 sSV 解释了 TPSAB1 基因、8p23.1、22q11 和 Sotos 综合征区域等医学相关区域中 otherwise cryptic 变异。与大型猿类基因组组装的比较表明,大多数人类 sSV 是在人类与猿类分离后最近形成的,涉及非重复介导的过程,而不仅仅是非等位基因同源重组。NAHRwhals 可靠地发现和表征了大规模的 sSV,且无需依赖物种,揭示了它们在基因组中的丰富程度,并暗示了它们对疾病的更广泛影响。