• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于超长纳米孔测序中端到端映射的精确快速连续小波动态时间规整算法。

An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.

机构信息

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia.

出版信息

Bioinformatics. 2018 Sep 1;34(17):i722-i731. doi: 10.1093/bioinformatics/bty555.

DOI:10.1093/bioinformatics/bty555
PMID:30423085
Abstract

MOTIVATION

Long-reads, point-of-care and polymerase chain reaction-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the end-to-end mapping between the raw electrical current signal sequence and the reference expected signal sequence serves as the key building block to signal labeling, and the following signal visualization, variant identification and methylation detection. One of the classic algorithms to solve the signal mapping problem is the dynamic time warping (DTW). However, the ultra-long nanopore sequencing and an order of magnitude difference in the sampling speed complexify the scenario and make the classical DTW infeasible to solve the problem.

RESULTS

Here, we propose a novel multi-level DTW algorithm, continuous wavelet DTW (cwDTW), based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can gain remarkable acceleration with tiny loss of the alignment accuracy. On the real nanopore datasets, cwDTW can finish an alignment task in few seconds, which is about 3000 times faster than the original DTW. By successfully applying cwDTW on the tasks of signal labeling and ultra-long sequence comparison, we further demonstrate the power and applicability of cwDTW.

AVAILABILITY AND IMPLEMENTATION

Our program is available at https://github.com/realbigws/cwDTW.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

长读长、即时检测和无需聚合酶链式反应是纳米孔测序带来的承诺。在纳米孔数据分析的各个步骤中,原始电信号序列与参考预期信号序列之间的端到端映射是信号标记的关键构建块,后续则是信号可视化、变体识别和甲基化检测。解决信号映射问题的经典算法之一是动态时间规整 (DTW)。然而,超长的纳米孔测序和采样速度量级上的差异使情况变得复杂,使得经典的 DTW 无法解决这个问题。

结果

在这里,我们提出了一种新的多级 DTW 算法,连续小波 DTW (cwDTW),它基于两个信号序列的不同尺度的连续小波变换。我们的算法从两个序列的低分辨率小波变换开始,使得变换后的序列较短且采样率相似。然后提取变换序列的峰和谷,形成具有相似长度的特征序列,这些序列可以通过原始 DTW 轻松映射。然后,我们的算法通过构建上下文相关的边界并在后一个序列中对路径进行受限搜索,将来自较低分辨率级别的变形路径递归地投影到较高分辨率级别的路径上。在两个真实的人类和 Pandoraea pnomenusa 纳米孔数据集上的综合实验证明了该算法的效率和有效性。特别是,cwDTW 可以在几乎不损失对准精度的情况下获得显著的加速。在真实的纳米孔数据集上,cwDTW 可以在几秒钟内完成一个对齐任务,这比原始 DTW 快约 3000 倍。通过在信号标记和超长序列比较等任务中成功应用 cwDTW,我们进一步证明了 cwDTW 的强大功能和适用性。

可用性和实现

我们的程序可在 https://github.com/realbigws/cwDTW 上获得。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.一种用于超长纳米孔测序中端到端映射的精确快速连续小波动态时间规整算法。
Bioinformatics. 2018 Sep 1;34(17):i722-i731. doi: 10.1093/bioinformatics/bty555.
2
Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing.针对靶向测序的新型算法,用于纳米孔原始信号中的高效子序列搜索和映射。
Bioinformatics. 2020 Mar 1;36(5):1333-1343. doi: 10.1093/bioinformatics/btz742.
3
EventDTW: An Improved Dynamic Time Warping Algorithm for Aligning Biomedical Signals of Nonuniform Sampling Frequencies.事件 DTW:一种用于对齐非均匀采样频率生物医学信号的改进动态时间规整算法。
Sensors (Basel). 2020 May 9;20(9):2700. doi: 10.3390/s20092700.
4
GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis.GPU 加速的自适应带状事件对齐,用于快速比较纳米孔信号分析。
BMC Bioinformatics. 2020 Aug 5;21(1):343. doi: 10.1186/s12859-020-03697-x.
5
Real-time mapping of nanopore raw signals.实时纳米孔原始信号映射。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i477-i483. doi: 10.1093/bioinformatics/btab264.
6
WarpSTR: determining tandem repeat lengths using raw nanopore signals.WarpSTR:使用原始纳米孔信号确定串联重复序列长度。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad388.
7
Overlap detection on long, error-prone sequencing reads via smooth q-gram.通过平滑 q-gram 检测长、易错测序读长中的重叠
Bioinformatics. 2020 Dec 8;36(19):4838-4845. doi: 10.1093/bioinformatics/btaa252.
8
ReadBouncer: precise and scalable adaptive sampling for nanopore sequencing.ReadBouncer:适用于纳米孔测序的精确和可扩展自适应采样。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i153-i160. doi: 10.1093/bioinformatics/btac223.
9
DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.DeepSimulator1.5:一款更强大、更快速、更轻量级的纳米孔测序模拟软件。
Bioinformatics. 2020 Apr 15;36(8):2578-2580. doi: 10.1093/bioinformatics/btz963.
10
ENANO: Encoder for NANOpore FASTQ files.ENANO:用于 Nanopore FASTQ 文件的编码器。
Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.

引用本文的文献

1
A Hitchhiker's Guide to long-read genomic analysis.长读长基因组分析指南
Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124.
2
TDFPS-Designer: an efficient toolkit for barcode design and selection in nanopore sequencing.TDFPS-Designer:一种用于纳米孔测序中条码设计和选择的高效工具包。
Genome Biol. 2024 Nov 4;25(1):285. doi: 10.1186/s13059-024-03423-3.
3
Biological Sequence Classification: A Review on Data and General Methods.生物序列分类:数据与通用方法综述
Research (Wash D C). 2022 Dec 19;2022:0011. doi: 10.34133/research.0011. eCollection 2022.
4
Efficient real-time selective genome sequencing on resource-constrained devices.在资源受限的设备上进行高效实时的选择性基因组测序。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad046. Epub 2023 Jul 3.
5
WarpSTR: determining tandem repeat lengths using raw nanopore signals.WarpSTR:使用原始纳米孔信号确定串联重复序列长度。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad388.
6
A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。
Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.
7
Multi-omics peripheral and core regions of cancer.癌症的多组学外周和核心区域。
NPJ Syst Biol Appl. 2022 Nov 29;8(1):47. doi: 10.1038/s41540-022-00258-1.
8
Simulation of Nanopore Sequencing Signals Based on BiGRU.基于 BiGRU 的纳米孔测序信号模拟。
Sensors (Basel). 2020 Dec 17;20(24):7244. doi: 10.3390/s20247244.
9
DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.DeepSimulator1.5:一款更强大、更快速、更轻量级的纳米孔测序模拟软件。
Bioinformatics. 2020 Apr 15;36(8):2578-2580. doi: 10.1093/bioinformatics/btz963.
10
RACS: rapid analysis of ChIP-Seq data for contig based genomes.RACS:基于连续基因组的 ChIP-Seq 数据的快速分析。
BMC Bioinformatics. 2019 Oct 29;20(1):533. doi: 10.1186/s12859-019-3100-2.