• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

悠然:利用具有单调注意力的编解码器模型的精确碱基调用器。

Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.

机构信息

Health Intelligence Center.

Human Genome Center.

出版信息

Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953.

DOI:10.1093/bioinformatics/btaa953
PMID:33165508
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8189681/
Abstract

MOTIVATION

In recent years, nanopore sequencing technology has enabled inexpensive long-read sequencing, which promises reads longer than a few thousand bases. Such long-read sequences contribute to the precise detection of structural variations and accurate haplotype phasing. However, deciphering precise DNA sequences from noisy and complicated nanopore raw signals remains a crucial demand for downstream analyses based on higher-quality nanopore sequencing, although various basecallers have been introduced to date.

RESULTS

To address this need, we developed a novel basecaller, Halcyon, that incorporates neural-network techniques frequently used in the field of machine translation. Our model employs monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels without any pre-segmentation against input signals. We evaluated performance with a human whole-genome sequencing dataset and demonstrated that Halcyon outperformed existing third-party basecallers and achieved competitive performance against the latest Oxford Nanopore Technologies' basecallers.

AVAILABILITYAND IMPLEMENTATION

The source code (halcyon) can be found at https://github.com/relastle/halcyon.

摘要

动机

近年来,纳米孔测序技术实现了廉价的长读测序,有望获得数千个碱基以上的长读序列。这些长读序列有助于精确检测结构变异和准确的单倍型相位。然而,从嘈杂和复杂的纳米孔原始信号中破译精确的 DNA 序列仍然是基于高质量纳米孔测序的下游分析的关键需求,尽管迄今为止已经引入了各种碱基调用器。

结果

为了满足这一需求,我们开发了一种新的碱基调用器 Halcyon,它结合了机器翻译领域常用的神经网络技术。我们的模型采用单调注意机制,在不针对输入信号进行任何预分段的情况下,学习核苷酸和信号电平之间的语义对应关系。我们使用人类全基因组测序数据集评估了性能,结果表明 Halcyon 优于现有的第三方碱基调用器,并与最新的牛津纳米孔技术碱基调用器具有竞争力。

可用性和实现

源代码(halcyon)可在 https://github.com/relastle/halcyon 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/af5362722182/btaa953f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/0bdd5283ed1c/btaa953f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/ab1a8f47abc6/btaa953f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/7402e2be7d56/btaa953f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/af5362722182/btaa953f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/0bdd5283ed1c/btaa953f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/ab1a8f47abc6/btaa953f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/7402e2be7d56/btaa953f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/987e/8189681/af5362722182/btaa953f4.jpg

相似文献

1
Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.悠然:利用具有单调注意力的编解码器模型的精确碱基调用器。
Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953.
2
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.SACall:基于自注意力机制的牛津纳米孔测序数据的神经网络碱基调用程序。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3.
3
RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data.RODAN:一种用于纳米孔 RNA 测序数据碱基调用的全卷积架构。
BMC Bioinformatics. 2022 Apr 20;23(1):142. doi: 10.1186/s12859-022-04686-y.
4
Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing.使用联合原始和事件纳米孔数据序列到序列处理进行碱基调用。
Sensors (Basel). 2022 Mar 15;22(6):2275. doi: 10.3390/s22062275.
5
DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.DeepSimulator1.5:一款更强大、更快速、更轻量级的纳米孔测序模拟软件。
Bioinformatics. 2020 Apr 15;36(8):2578-2580. doi: 10.1093/bioinformatics/btz963.
6
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants.物种特异性碱基识别器提高了植物纳米孔测序的实际准确性。
Plant Methods. 2022 Dec 14;18(1):137. doi: 10.1186/s13007-022-00971-2.
7
Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing.成对共识解码可提高神经网络碱基调用器对纳米孔测序的准确性。
Genome Biol. 2021 Jan 19;22(1):38. doi: 10.1186/s13059-020-02255-1.
8
MSRCall: a multi-scale deep neural network to basecall Oxford Nanopore sequences.MSRCall:一种用于对牛津纳米孔序列进行碱基调用的多尺度深度神经网络。
Bioinformatics. 2022 Aug 10;38(16):3877-3884. doi: 10.1093/bioinformatics/btac435.
9
NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data.NanoSNP:一种针对低覆盖度纳米孔测序数据的渐进式、单体型感知 SNP 调用程序。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac824.
10
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.

引用本文的文献

1
SqueezeCall: nanopore basecalling using a Squeezeformer network.SqueezeCall:使用Squeezeformer网络进行纳米孔碱基识别
GigaByte. 2025 Feb 14;2025:gigabyte148. doi: 10.46471/gigabyte.148. eCollection 2025.
2
TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering.目标调用:通过碱基识别前的过滤消除碱基识别中浪费的计算。
Front Genet. 2024 Oct 28;15:1429306. doi: 10.3389/fgene.2024.1429306. eCollection 2024.
3
Simple, reference-independent assessment to empirically guide correction and polishing of hybrid microbial community metagenomic assembly.

本文引用的文献

1
Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。
Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.
2
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.
3
Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome.牛津纳米孔 PromethION 测序人类基因组鉴定的结构变异。
简单、无需参考的评估方法,可实际指导混合微生物群落宏基因组组装的纠错和优化。
PeerJ. 2024 Nov 8;12:e18132. doi: 10.7717/peerj.18132. eCollection 2024.
4
RUBICON: a framework for designing efficient deep learning-based genomic basecallers.RUBICON:一种用于设计高效深度学习基因组碱基调用器的框架。
Genome Biol. 2024 Feb 16;25(1):49. doi: 10.1186/s13059-024-03181-2.
5
Aptamer-Functionalized Interface Nanopores Enable Amino Acid-Specific Peptide Detection.适配体功能化界面纳米孔可实现氨基酸特异性肽检测。
ACS Nano. 2024 Feb 27;18(8):6286-6297. doi: 10.1021/acsnano.3c10679. Epub 2024 Feb 14.
6
Solid-State Nanopores for Biomolecular Analysis and Detection.用于生物分子分析与检测的固态纳米孔
Adv Biochem Eng Biotechnol. 2024;187:283-316. doi: 10.1007/10_2023_240.
7
Lokatt: a hybrid DNA nanopore basecaller with an explicit duration hidden Markov model and a residual LSTM network.Lokatt:一种具有显式持续时间隐马尔可夫模型和剩余长短期记忆网络的混合 DNA 纳米孔碱基调用器。
BMC Bioinformatics. 2023 Dec 7;24(1):461. doi: 10.1186/s12859-023-05580-x.
8
Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling.深度学习模型在纳米孔测序碱基调用中的综合基准测试和体系结构分析。
Genome Biol. 2023 Apr 11;24(1):71. doi: 10.1186/s13059-023-02903-2.
9
Applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era.纳米孔测序在(表观)基因组和(表观)转录组时代的应用与潜力
Innovation (Camb). 2021 Aug 11;2(4):100153. doi: 10.1016/j.xinn.2021.100153. eCollection 2021 Nov 28.
10
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.
Genome Res. 2019 Jul;29(7):1178-1187. doi: 10.1101/gr.244939.118. Epub 2019 Jun 11.
4
Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。
Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.
5
BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files.BulkVis:用于牛津纳米孔批量 FAST5 文件的图形查看器。
Bioinformatics. 2019 Jul 1;35(13):2193-2198. doi: 10.1093/bioinformatics/bty841.
6
Strelka2: fast and accurate calling of germline and somatic variants.Strelka2:快速准确地调用种系和体细胞变异。
Nat Methods. 2018 Aug;15(8):591-594. doi: 10.1038/s41592-018-0051-x. Epub 2018 Jul 16.
7
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
8
Picky comprehensively detects high-resolution structural variants in nanopore long reads.挑剔全面检测纳米孔长读中的高分辨率结构变体。
Nat Methods. 2018 Jun;15(6):455-460. doi: 10.1038/s41592-018-0002-6. Epub 2018 Apr 30.
9
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.奇龙:利用深度学习将纳米孔原始信号直接转换为核苷酸序列。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy037.
10
Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.