Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
Nat Commun. 2024 Aug 13;15(1):6956. doi: 10.1038/s41467-024-51282-0.
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
结构变异(SVs)对人类基因组多样性有重要贡献,并在精准医学中起着关键作用。尽管单分子长读测序的进步为 SV 检测提供了一个开创性的资源,但准确而稳健地识别 SV 断点和序列仍然具有挑战性。我们引入了 VolcanoSV,这是一种创新的混合 SV 检测管道,它利用参考基因组和局部从头组装来生成相位二倍体组装。VolcanoSV 使用相位 SNPs 和独特的 k-mer 相似性分析,实现精确的单倍型解析 SV 发现。VolcanoSV 擅长构建包含 SNPs、小插入缺失和所有类型 SV 的全面遗传图谱,非常适合人类基因组学研究。我们的广泛实验表明,VolcanoSV 在插入和缺失 SV 的检测方面优于最先进的基于组装的工具,在各种数据集(包括低覆盖率(10x)数据集)中表现出更高的召回率、精度、F1 得分和基因型准确性。VolcanoSV 在识别模拟和真实癌症数据中的复杂 SV,包括易位、重复和倒位方面,优于基于组装的工具。此外,VolcanoSV 对各种评估参数具有鲁棒性,并能准确识别断点和 SV 序列。