悠然：利用具有单调注意力的编解码器模型的精确碱基调用器。

Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.

机构信息

Health Intelligence Center.

Human Genome Center.

出版信息

Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953.

DOI:10.1093/bioinformatics/btaa953

PMID:33165508

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8189681/

Abstract

MOTIVATION

In recent years, nanopore sequencing technology has enabled inexpensive long-read sequencing, which promises reads longer than a few thousand bases. Such long-read sequences contribute to the precise detection of structural variations and accurate haplotype phasing. However, deciphering precise DNA sequences from noisy and complicated nanopore raw signals remains a crucial demand for downstream analyses based on higher-quality nanopore sequencing, although various basecallers have been introduced to date.

RESULTS

To address this need, we developed a novel basecaller, Halcyon, that incorporates neural-network techniques frequently used in the field of machine translation. Our model employs monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels without any pre-segmentation against input signals. We evaluated performance with a human whole-genome sequencing dataset and demonstrated that Halcyon outperformed existing third-party basecallers and achieved competitive performance against the latest Oxford Nanopore Technologies' basecallers.

AVAILABILITYAND IMPLEMENTATION

The source code (halcyon) can be found at https://github.com/relastle/halcyon.

摘要

动机

近年来，纳米孔测序技术实现了廉价的长读测序，有望获得数千个碱基以上的长读序列。这些长读序列有助于精确检测结构变异和准确的单倍型相位。然而，从嘈杂和复杂的纳米孔原始信号中破译精确的 DNA 序列仍然是基于高质量纳米孔测序的下游分析的关键需求，尽管迄今为止已经引入了各种碱基调用器。

结果

为了满足这一需求，我们开发了一种新的碱基调用器 Halcyon，它结合了机器翻译领域常用的神经网络技术。我们的模型采用单调注意机制，在不针对输入信号进行任何预分段的情况下，学习核苷酸和信号电平之间的语义对应关系。我们使用人类全基因组测序数据集评估了性能，结果表明 Halcyon 优于现有的第三方碱基调用器，并与最新的牛津纳米孔技术碱基调用器具有竞争力。

可用性和实现

源代码（halcyon）可在 https://github.com/relastle/halcyon 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/0bdd5283ed1c/btaa953f1.jpg

相似文献

Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.悠然：利用具有单调注意力的编解码器模型的精确碱基调用器。

Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953.

SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.SACall：基于自注意力机制的牛津纳米孔测序数据的神经网络碱基调用程序。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3.

RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data.RODAN：一种用于纳米孔 RNA 测序数据碱基调用的全卷积架构。

BMC Bioinformatics. 2022 Apr 20;23(1):142. doi: 10.1186/s12859-022-04686-y.

Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing.使用联合原始和事件纳米孔数据序列到序列处理进行碱基调用。

Sensors (Basel). 2022 Mar 15;22(6):2275. doi: 10.3390/s22062275.

DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.DeepSimulator1.5：一款更强大、更快速、更轻量级的纳米孔测序模拟软件。

Bioinformatics. 2020 Apr 15;36(8):2578-2580. doi: 10.1093/bioinformatics/btz963.

Species-specific basecallers improve actual accuracy of nanopore sequencing in plants.物种特异性碱基识别器提高了植物纳米孔测序的实际准确性。

Plant Methods. 2022 Dec 14;18(1):137. doi: 10.1186/s13007-022-00971-2.

Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing.成对共识解码可提高神经网络碱基调用器对纳米孔测序的准确性。

Genome Biol. 2021 Jan 19;22(1):38. doi: 10.1186/s13059-020-02255-1.

MSRCall: a multi-scale deep neural network to basecall Oxford Nanopore sequences.MSRCall：一种用于对牛津纳米孔序列进行碱基调用的多尺度深度神经网络。

Bioinformatics. 2022 Aug 10;38(16):3877-3884. doi: 10.1093/bioinformatics/btac435.

NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data.NanoSNP：一种针对低覆盖度纳米孔测序数据的渐进式、单体型感知 SNP 调用程序。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac824.

Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。

Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.

引用本文的文献

SqueezeCall: nanopore basecalling using a Squeezeformer network.SqueezeCall：使用Squeezeformer网络进行纳米孔碱基识别

GigaByte. 2025 Feb 14;2025:gigabyte148. doi: 10.46471/gigabyte.148. eCollection 2025.

TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering.目标调用：通过碱基识别前的过滤消除碱基识别中浪费的计算。

Front Genet. 2024 Oct 28;15:1429306. doi: 10.3389/fgene.2024.1429306. eCollection 2024.

Simple, reference-independent assessment to empirically guide correction and polishing of hybrid microbial community metagenomic assembly.简单、无需参考的评估方法，可实际指导混合微生物群落宏基因组组装的纠错和优化。

PeerJ. 2024 Nov 8;12:e18132. doi: 10.7717/peerj.18132. eCollection 2024.

RUBICON: a framework for designing efficient deep learning-based genomic basecallers.RUBICON：一种用于设计高效深度学习基因组碱基调用器的框架。

Genome Biol. 2024 Feb 16;25(1):49. doi: 10.1186/s13059-024-03181-2.

Aptamer-Functionalized Interface Nanopores Enable Amino Acid-Specific Peptide Detection.适配体功能化界面纳米孔可实现氨基酸特异性肽检测。

ACS Nano. 2024 Feb 27;18(8):6286-6297. doi: 10.1021/acsnano.3c10679. Epub 2024 Feb 14.

Solid-State Nanopores for Biomolecular Analysis and Detection.用于生物分子分析与检测的固态纳米孔

Adv Biochem Eng Biotechnol. 2024;187:283-316. doi: 10.1007/10_2023_240.

Lokatt: a hybrid DNA nanopore basecaller with an explicit duration hidden Markov model and a residual LSTM network.Lokatt：一种具有显式持续时间隐马尔可夫模型和剩余长短期记忆网络的混合 DNA 纳米孔碱基调用器。

BMC Bioinformatics. 2023 Dec 7;24(1):461. doi: 10.1186/s12859-023-05580-x.

Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling.深度学习模型在纳米孔测序碱基调用中的综合基准测试和体系结构分析。

Genome Biol. 2023 Apr 11;24(1):71. doi: 10.1186/s13059-023-02903-2.

Applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era.纳米孔测序在（表观）基因组和（表观）转录组时代的应用与潜力

Innovation (Camb). 2021 Aug 11;2(4):100153. doi: 10.1016/j.xinn.2021.100153. eCollection 2021 Nov 28.

Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。

Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.

本文引用的文献

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。

Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.

Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome.牛津纳米孔 PromethION 测序人类基因组鉴定的结构变异。

Genome Res. 2019 Jul;29(7):1178-1187. doi: 10.1101/gr.244939.118. Epub 2019 Jun 11.

Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。

Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.

BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files.BulkVis：用于牛津纳米孔批量 FAST5 文件的图形查看器。

Bioinformatics. 2019 Jul 1;35(13):2193-2198. doi: 10.1093/bioinformatics/bty841.

Strelka2: fast and accurate calling of germline and somatic variants.Strelka2：快速准确地调用种系和体细胞变异。

Nat Methods. 2018 Aug;15(8):591-594. doi: 10.1038/s41592-018-0051-x. Epub 2018 Jul 16.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Picky comprehensively detects high-resolution structural variants in nanopore long reads.挑剔全面检测纳米孔长读中的高分辨率结构变体。

Nat Methods. 2018 Jun;15(6):455-460. doi: 10.1038/s41592-018-0002-6. Epub 2018 Apr 30.

Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.奇龙：利用深度学习将纳米孔原始信号直接转换为核苷酸序列。

Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy037.

Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。

Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

悠然：利用具有单调注意力的编解码器模型的精确碱基调用器。

Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITYAND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献