• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用前馈变压器对纳米孔测序信号进行端到端模拟。

End-to-end simulation of nanopore sequencing signals with feed-forward transformers.

作者信息

Beslic Denis, Kucklick Martin, Engelmann Susanne, Fuchs Stephan, Renard Bernhard Y, Körber Nils

机构信息

Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin 13353, Germany.

Institute for Microbiology, Technical University of Braunschweig, Braunschweig 38106, Germany.

出版信息

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae744.

DOI:10.1093/bioinformatics/btae744
PMID:39710838
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11729726/
Abstract

MOTIVATION

Nanopore sequencing represents a significant advancement in genomics, enabling direct long-read DNA sequencing at the single-molecule level. Accurate simulation of nanopore sequencing signals from nucleotide sequences is crucial for method development and for complementing experimental data. Most existing approaches rely on predefined statistical models, which may not adequately capture the properties of experimental signal data. Furthermore, these simulators were developed for earlier versions of nanopore chemistry, which limits their applicability and adaptability to the latest flow cell data.

RESULTS

To enhance the quality of artificial signals, we introduce seq2squiggle, a novel transformer-based, non-autoregressive model designed to generate nanopore sequencing signals from nucleotide sequences. Unlike existing simulators that rely on static k-mer models, our approach learns sequential contextual information from segmented signal data. We benchmark seq2squiggle against state-of-the-art simulators on real experimental R9.4.1 and R10.4.1 data, evaluating signal similarity, basecalling accuracy, and variant detection rates. Seq2squiggle consistently outperforms existing tools across multiple datasets, demonstrating superior similarity to real data and offering a robust solution for simulating nanopore sequencing signals with the latest flow cell generation.

AVAILABILITY AND IMPLEMENTATION

seq2squiggle is freely available on GitHub at: github.com/ZKI-PH-ImageAnalysis/seq2squiggle.

摘要

动机

纳米孔测序代表了基因组学的一项重大进展,能够在单分子水平上进行直接的长读长DNA测序。从核苷酸序列准确模拟纳米孔测序信号对于方法开发和补充实验数据至关重要。大多数现有方法依赖于预定义的统计模型,可能无法充分捕捉实验信号数据的特性。此外,这些模拟器是为早期版本的纳米孔化学开发的,这限制了它们对最新流动池数据的适用性和适应性。

结果

为了提高人工信号的质量,我们引入了seq2squiggle,这是一种基于新型变换器的非自回归模型,旨在从核苷酸序列生成纳米孔测序信号。与依赖静态k-mer模型的现有模拟器不同,我们的方法从分段信号数据中学习序列上下文信息。我们在真实实验的R9.4.1和R10.4.1数据上,将seq2squiggle与最先进的模拟器进行基准测试,评估信号相似性、碱基识别准确性和变异检测率。Seq2squiggle在多个数据集中始终优于现有工具,显示出与真实数据的卓越相似性,并为使用最新流动池一代模拟纳米孔测序信号提供了一个强大的解决方案。

可用性和实现

seq2squiggle可在GitHub上免费获取:github.com/ZKI-PH-ImageAnalysis/seq2squiggle。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/216a6dadf5e3/btae744f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/992d6f374a2b/btae744f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/ea7340332c33/btae744f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/9cba37fc4d35/btae744f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/216a6dadf5e3/btae744f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/992d6f374a2b/btae744f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/ea7340332c33/btae744f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/9cba37fc4d35/btae744f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d166/11729726/216a6dadf5e3/btae744f4.jpg

相似文献

1
End-to-end simulation of nanopore sequencing signals with feed-forward transformers.使用前馈变压器对纳米孔测序信号进行端到端模拟。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae744.
2
Leveraging basecaller's move table to generate a lightweight k-mer model for nanopore sequencing analysis.利用碱基识别器的移动表为纳米孔测序分析生成轻量级k-mer模型。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf111.
3
Simulation of nanopore sequencing signal data with tunable parameters.可调参数的纳米孔测序信号数据模拟。
Genome Res. 2024 Jun 25;34(5):778-783. doi: 10.1101/gr.278730.123.
4
RawHash2: mapping raw nanopore signals using hash-based seeding and adaptive quantization.RawHash2:基于哈希的种子生成和自适应量化的原始纳米孔信号映射。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae478.
5
DeepMP: a deep learning tool to detect DNA base modifications on Nanopore sequencing data.DeepMP:一种用于检测纳米孔测序数据中 DNA 碱基修饰的深度学习工具。
Bioinformatics. 2022 Feb 7;38(5):1235-1243. doi: 10.1093/bioinformatics/btab745.
6
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.超越测序:机器学习算法从纳米孔信号数据中提取隐藏的生物学信息。
Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25.
7
Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim.利用 Meta-NanoSim 对宏基因组纳米孔测序数据进行特征描述和模拟。
Gigascience. 2023 Mar 20;12. doi: 10.1093/gigascience/giad013.
8
Closing the gap: Oxford Nanopore Technologies R10 sequencing allows comparable results to Illumina sequencing for SNP-based outbreak investigation of bacterial pathogens.缩小差距:牛津纳米孔技术 R10 测序能够与 Illumina 测序相媲美,可用于基于 SNP 的细菌病原体暴发调查。
J Clin Microbiol. 2024 May 8;62(5):e0157623. doi: 10.1128/jcm.01576-23. Epub 2024 Mar 5.
9
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.SACall:基于自注意力机制的牛津纳米孔测序数据的神经网络碱基调用程序。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3.
10
Icarust, a real-time simulator for Oxford Nanopore adaptive sampling.Icarust,牛津纳米孔自适应采样的实时模拟器。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae141.

本文引用的文献

1
Simulation of nanopore sequencing signal data with tunable parameters.可调参数的纳米孔测序信号数据模拟。
Genome Res. 2024 Jun 25;34(5):778-783. doi: 10.1101/gr.278730.123.
2
A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing.基于 Oxford Nanopore 测序的甲基化检测的信号处理与深度学习框架。
Nat Commun. 2024 Feb 16;15(1):1448. doi: 10.1038/s41467-024-45778-y.
3
Symphonizing pileup and full-alignment for deep learning-based long-read variant calling.基于深度学习的长读变异调用的交响乐堆积和全对齐。
Nat Comput Sci. 2022 Dec;2(12):797-803. doi: 10.1038/s43588-022-00387-x. Epub 2022 Dec 19.
4
Accelerated nanopore basecalling with SLOW5 data format.基于 SLOW5 数据格式的快速纳米孔碱基调用。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad352.
5
Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling.深度学习模型在纳米孔测序碱基调用中的综合基准测试和体系结构分析。
Genome Biol. 2023 Apr 11;24(1):71. doi: 10.1186/s13059-023-02903-2.
6
Fast nanopore sequencing data analysis with SLOW5.基于 SLOW5 的快速纳米孔测序数据分析。
Nat Biotechnol. 2022 Jul;40(7):1026-1029. doi: 10.1038/s41587-021-01147-4. Epub 2022 Jan 3.
7
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.
8
Sequencing DNA with nanopores: Troubles and biases.用纳米孔测序 DNA:问题和偏差。
PLoS One. 2021 Oct 1;16(10):e0257521. doi: 10.1371/journal.pone.0257521. eCollection 2021.
9
Simulation of Nanopore Sequencing Signals Based on BiGRU.基于 BiGRU 的纳米孔测序信号模拟。
Sensors (Basel). 2020 Dec 17;20(24):7244. doi: 10.3390/s20247244.
10
Opportunities and challenges in long-read sequencing data analysis.长读测序数据分析中的机遇与挑战。
Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.