Suppr超能文献

一种用于超长纳米孔测序中端到端映射的精确快速连续小波动态时间规整算法。

An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing.

机构信息

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia.

出版信息

Bioinformatics. 2018 Sep 1;34(17):i722-i731. doi: 10.1093/bioinformatics/bty555.

Abstract

MOTIVATION

Long-reads, point-of-care and polymerase chain reaction-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the end-to-end mapping between the raw electrical current signal sequence and the reference expected signal sequence serves as the key building block to signal labeling, and the following signal visualization, variant identification and methylation detection. One of the classic algorithms to solve the signal mapping problem is the dynamic time warping (DTW). However, the ultra-long nanopore sequencing and an order of magnitude difference in the sampling speed complexify the scenario and make the classical DTW infeasible to solve the problem.

RESULTS

Here, we propose a novel multi-level DTW algorithm, continuous wavelet DTW (cwDTW), based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can gain remarkable acceleration with tiny loss of the alignment accuracy. On the real nanopore datasets, cwDTW can finish an alignment task in few seconds, which is about 3000 times faster than the original DTW. By successfully applying cwDTW on the tasks of signal labeling and ultra-long sequence comparison, we further demonstrate the power and applicability of cwDTW.

AVAILABILITY AND IMPLEMENTATION

Our program is available at https://github.com/realbigws/cwDTW.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

长读长、即时检测和无需聚合酶链式反应是纳米孔测序带来的承诺。在纳米孔数据分析的各个步骤中,原始电信号序列与参考预期信号序列之间的端到端映射是信号标记的关键构建块,后续则是信号可视化、变体识别和甲基化检测。解决信号映射问题的经典算法之一是动态时间规整 (DTW)。然而,超长的纳米孔测序和采样速度量级上的差异使情况变得复杂,使得经典的 DTW 无法解决这个问题。

结果

在这里,我们提出了一种新的多级 DTW 算法,连续小波 DTW (cwDTW),它基于两个信号序列的不同尺度的连续小波变换。我们的算法从两个序列的低分辨率小波变换开始,使得变换后的序列较短且采样率相似。然后提取变换序列的峰和谷,形成具有相似长度的特征序列,这些序列可以通过原始 DTW 轻松映射。然后,我们的算法通过构建上下文相关的边界并在后一个序列中对路径进行受限搜索,将来自较低分辨率级别的变形路径递归地投影到较高分辨率级别的路径上。在两个真实的人类和 Pandoraea pnomenusa 纳米孔数据集上的综合实验证明了该算法的效率和有效性。特别是,cwDTW 可以在几乎不损失对准精度的情况下获得显著的加速。在真实的纳米孔数据集上,cwDTW 可以在几秒钟内完成一个对齐任务,这比原始 DTW 快约 3000 倍。通过在信号标记和超长序列比较等任务中成功应用 cwDTW,我们进一步证明了 cwDTW 的强大功能和适用性。

可用性和实现

我们的程序可在 https://github.com/realbigws/cwDTW 上获得。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验