火神：通过双模比对提高长读段比对和结构变异 calling。

Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment.

机构信息

Department of Computer Science, Rice University, Houston, TX 77251-1892, USA.

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

Gigascience. 2021 Sep 24;10(9). doi: 10.1093/gigascience/giab063.

DOI:10.1093/gigascience/giab063

PMID:34561697

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8463296/

Abstract

BACKGROUND

Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection.

FINDINGS

We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone.

CONCLUSIONS

Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.

摘要

背景

长读测序技术使对人类全基因组结构变异的全面调查成为可能。为了在这种情况下最大限度地发挥长读测序的潜力，出现了一些新的映射方法，这些方法主要侧重于速度或准确性。在广泛使用的读映射器（minimap2 和 NGMLR）中实现了各种启发式和评分方案，以优化速度或准确性，这些方案在不同的基因组区域和特定的结构变体上的性能各不相同。我们的假设是，将读映射约束为在不同突变热点使用单一缺口罚分，会降低读对齐的准确性，并阻碍结构变体的检测。

结果

我们通过实现一个名为 Vulcan 的读映射管道来测试我们的假设，该管道使用两种不同的缺口罚分模式，我们称之为双模对齐。其基本思想是，Vulcan 通过 minimap2 利用映射读取的计算归一化编辑距离来识别对齐不良的读取，并使用更准确但计算成本更高的长读取映射器（NGMLR）重新对齐它们。为了支持我们的假设，我们表明，Vulcan 提高了牛津纳米孔技术长读取在模拟和真实数据集上的对齐度。这些改进反过来又提高了人类基因组数据集上结构变体调用性能的准确性，优于单独使用任何一种读映射方法。

结论

Vulcan 是第一个结合两种不同缺口罚分模式以提高结构变体召回率和精度的长读映射框架。Vulcan 是一个开源框架，可在 MIT 许可证下通过以下网址获得：https://gitlab.com/treangenlab/vulcan。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2690/8463296/87c5f3eb07a6/giab063fig1.jpg

相似文献

Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment.火神：通过双模比对提高长读段比对和结构变异 calling。

Gigascience. 2021 Sep 24;10(9). doi: 10.1093/gigascience/giab063.

Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.利用直系同源序列变异进行敏感比对可提高大片段重复区域的长读长序列比对和变异calling 效率。

Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.

Fast and sensitive mapping of nanopore sequencing reads with GraphMap.使用GraphMap对纳米孔测序读数进行快速灵敏的映射

Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307.

Benchmarking long-read genome sequence alignment tools for human genomics applications.用于人类基因组学应用的长读长基因组序列比对工具的基准测试。

PeerJ. 2023 Dec 18;11:e16515. doi: 10.7717/peerj.16515. eCollection 2023.

Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。

Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

A Comparison of Structural Variant Calling from Short-Read and Nanopore-Based Whole-Genome Sequencing Using Optical Genome Mapping as a Benchmark.基于光学基因组图谱作为基准的短读长和纳米孔全基因组测序的结构变异调用比较。

Genes (Basel). 2024 Jul 16;15(7):925. doi: 10.3390/genes15070925.

HQAlign: aligning nanopore reads for SV detection using current-level modeling.HQAlign：使用电流水平建模对齐纳米孔读取以进行 SV 检测。

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad580.

Accurate detection of complex structural variations using single-molecule sequencing.利用单分子测序技术准确检测复杂结构变异。

Nat Methods. 2018 Jun;15(6):461-468. doi: 10.1038/s41592-018-0001-7. Epub 2018 Apr 30.

SVNN: an efficient PacBio-specific pipeline for structural variations calling using neural networks.SVNN：一种使用神经网络进行高效 PacBio 特定结构变异调用的管道。

BMC Bioinformatics. 2021 Jun 19;22(1):335. doi: 10.1186/s12859-021-04184-7.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.ARKS：基于链接读取子的人类基因组草图染色体级 scaffolding。

BMC Bioinformatics. 2018 Jun 20;19(1):234. doi: 10.1186/s12859-018-2243-x.

引用本文的文献

Sodium azide mutagenesis induces a unique pattern of mutations.叠氮化钠诱变会诱导出一种独特的突变模式。

PLoS Genet. 2025 Jun 3;21(6):e1011634. doi: 10.1371/journal.pgen.1011634. eCollection 2025 Jun.

A Hitchhiker's Guide to long-read genomic analysis.长读长基因组分析指南

Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124.

Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology.利用牛津纳米孔测序技术在肺癌中进行体细胞结构变异检测的性能。

BMC Genomics. 2024 Sep 30;25(1):898. doi: 10.1186/s12864-024-10792-3.

Chromosome-Level Genome Assembly of the Viviparous Eelpout Zoarces viviparus.胎生杜父鱼染色体水平基因组组装。

Genome Biol Evol. 2024 Aug 5;16(8). doi: 10.1093/gbe/evae155.

Analysis and benchmarking of small and large genomic variants across tandem repeats.串联重复序列中小的和大的基因组变异的分析与基准测试。

Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26.

Genomic variant benchmark: if you cannot measure it, you cannot improve it.基因组变异基准：如果无法衡量，就无法改进。

Genome Biol. 2023 Oct 5;24(1):221. doi: 10.1186/s13059-023-03061-1.

A Non-Polar Mutant Confirms the Role of the Two-Component System BvrR/BvrS in Virulence and Membrane Integrity.一个非极性突变体证实了双组分系统BvrR/BvrS在毒力和膜完整性中的作用。

Microorganisms. 2023 Aug 5;11(8):2014. doi: 10.3390/microorganisms11082014.

A survey of algorithms for the detection of genomic structural variants from long-read sequencing data.长读测序数据中基因组结构变异检测算法研究综述。

Nat Methods. 2023 Aug;20(8):1143-1158. doi: 10.1038/s41592-023-01932-w. Epub 2023 Jun 29.

A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。

Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.

Portable nanopore-sequencing technology: Trends in development and applications.便携式纳米孔测序技术：发展趋势与应用

Front Microbiol. 2023 Feb 1;14:1043967. doi: 10.3389/fmicb.2023.1043967. eCollection 2023.

本文引用的文献

lra: A long read aligner for sequences and contigs.lra：一种用于序列和重叠群的长读比对工具。

PLoS Comput Biol. 2021 Jun 21;17(6):e1009078. doi: 10.1371/journal.pcbi.1009078. eCollection 2021 Jun.

Towards population-scale long-read sequencing.迈向大规模长读长测序。

Nat Rev Genet. 2021 Sep;22(9):572-587. doi: 10.1038/s41576-021-00367-3. Epub 2021 May 28.

Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.

A diploid assembly-based benchmark for variants in the major histocompatibility complex.基于二倍体组装的主要组织相容性复合体变异基准

Nat Commun. 2020 Sep 22;11(1):4794. doi: 10.1038/s41467-020-18564-9.

Weighted minimizer sampling improves long read mapping.加权最小化抽样提高长读测序数据的比对。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i111-i118. doi: 10.1093/bioinformatics/btaa435.

Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato.广泛的结构变异对番茄基因表达和作物改良的主要影响。

Cell. 2020 Jul 9;182(1):145-161.e23. doi: 10.1016/j.cell.2020.05.021. Epub 2020 Jun 17.

A robust benchmark for detection of germline large deletions and insertions.一种用于检测种系大片段缺失和插入的稳健基准

Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.

The third generation sequencing: the advanced approach to genetic diseases.第三代测序：遗传疾病的先进方法。

Transl Pediatr. 2020 Apr;9(2):163-173. doi: 10.21037/tp.2020.03.06.

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.使用基于质量值的贪婪算法对长读长转录组数据进行从头聚类

J Comput Biol. 2020 Apr;27(4):472-484. doi: 10.1089/cmb.2019.0299. Epub 2020 Mar 16.

Structural variant calling: the long and the short of it.结构变异 calling：长与短。

Genome Biol. 2019 Nov 20;20(1):246. doi: 10.1186/s13059-019-1828-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

火神：通过双模比对提高长读段比对和结构变异 calling。

Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment.

机构信息

出版信息

BACKGROUND

FINDINGS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献