• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用联合原始和事件纳米孔数据序列到序列处理进行碱基调用。

Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing.

机构信息

Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland.

出版信息

Sensors (Basel). 2022 Mar 15;22(6):2275. doi: 10.3390/s22062275.

DOI:10.3390/s22062275
PMID:35336445
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8954548/
Abstract

Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalling. Various solutions for basecalling have already been proposed. The earlier ones were based on Hidden Markov Models, but the best ones use neural networks or other machine learning models. Unfortunately, achieved accuracy scores are still lower than competitive sequencing techniques, like Illumina's. Basecallers differ in the input data type-currently, most of them work on a raw data straight from the sequencer (time series of current). Still, the approach of using event data is also explored. Event data is obtained by preprocessing of raw data and dividing it into segments described by several features computed from raw data values within each segment. We propose a novel basecaller that uses joint processing of raw and event data. We define basecalling as a sequence-to-sequence translation, and we use a machine learning model based on an encoder-decoder architecture of recurrent neural networks. Our model incorporates twin encoders and an attention mechanism. We tested our solution on simulated and real datasets. We compare the full model accuracy results with its components: processing only raw or event data. We compare our solution with the existing ONT basecaller-Guppy. Results of numerical experiments show that joint raw and event data processing provides better basecalling accuracy than processing each data type separately. We implement an application called Ravvent, freely available under MIT licence.

摘要

第三代 DNA 测序仪由牛津纳米孔技术公司(ONT)提供,它在纳米孔中产生一系列电流样本。这种时间序列用于检测核苷酸序列。将电流值转换为核苷酸符号的任务称为碱基调用。已经提出了各种碱基调用解决方案。早期的方案基于隐马尔可夫模型,但最好的方案使用神经网络或其他机器学习模型。不幸的是,所达到的准确率仍然低于竞争测序技术,如 Illumina 的。碱基调用器在输入数据类型上有所不同——目前,大多数碱基调用器都基于直接从测序仪获取的原始数据(电流时间序列)。然而,使用事件数据的方法也在探索中。事件数据是通过对原始数据进行预处理并将其划分为几个特征描述的片段而获得的,这些特征是从每个片段内的原始数据值计算得出的。我们提出了一种新的碱基调用器,它使用原始数据和事件数据的联合处理。我们将碱基调用定义为序列到序列的翻译,并使用基于递归神经网络编码器-解码器架构的机器学习模型。我们的模型包含两个编码器和一个注意力机制。我们在模拟数据集和真实数据集上测试了我们的解决方案。我们将完整模型的准确率结果与其组件进行比较:仅处理原始数据或事件数据。我们将我们的解决方案与现有的 ONT 碱基调用器 Guppy 进行了比较。数值实验结果表明,联合原始数据和事件数据的处理比分别处理每种数据类型提供更好的碱基调用准确率。我们实现了一个名为 Ravvent 的应用程序,它可以根据 MIT 许可证自由使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/9c7d12930746/sensors-22-02275-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/4d7dcec1e924/sensors-22-02275-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/19a2bd5d4d50/sensors-22-02275-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/9c7d12930746/sensors-22-02275-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/4d7dcec1e924/sensors-22-02275-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/19a2bd5d4d50/sensors-22-02275-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82fb/8954548/9c7d12930746/sensors-22-02275-g008.jpg

相似文献

1
Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing.使用联合原始和事件纳米孔数据序列到序列处理进行碱基调用。
Sensors (Basel). 2022 Mar 15;22(6):2275. doi: 10.3390/s22062275.
2
Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing.用于 Oxford Nanopore 测序的碱基调用工具的核苷酸重建质量符号估计。
Sensors (Basel). 2023 Jul 29;23(15):6787. doi: 10.3390/s23156787.
3
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.
4
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.SACall:基于自注意力机制的牛津纳米孔测序数据的神经网络碱基调用程序。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3.
5
Lokatt: a hybrid DNA nanopore basecaller with an explicit duration hidden Markov model and a residual LSTM network.Lokatt:一种具有显式持续时间隐马尔可夫模型和剩余长短期记忆网络的混合 DNA 纳米孔碱基调用器。
BMC Bioinformatics. 2023 Dec 7;24(1):461. doi: 10.1186/s12859-023-05580-x.
6
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.NanoReviser:一种基于深度学习算法的纳米孔测序纠错工具。
Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020.
7
RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data.RODAN:一种用于纳米孔 RNA 测序数据碱基调用的全卷积架构。
BMC Bioinformatics. 2022 Apr 20;23(1):142. doi: 10.1186/s12859-022-04686-y.
8
MSRCall: a multi-scale deep neural network to basecall Oxford Nanopore sequences.MSRCall:一种用于对牛津纳米孔序列进行碱基调用的多尺度深度神经网络。
Bioinformatics. 2022 Aug 10;38(16):3877-3884. doi: 10.1093/bioinformatics/btac435.
9
Nanopore basecalling from a perspective of instance segmentation.基于实例分割的纳米孔碱基调用。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):136. doi: 10.1186/s12859-020-3459-0.
10
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants.物种特异性碱基识别器提高了植物纳米孔测序的实际准确性。
Plant Methods. 2022 Dec 14;18(1):137. doi: 10.1186/s13007-022-00971-2.

引用本文的文献

1
A full-length 18S ribosomal DNA metabarcoding approach for determining protist community diversity using Nanopore sequencing.一种使用纳米孔测序确定原生生物群落多样性的全长18S核糖体DNA宏条形码方法。
Ecol Evol. 2024 Apr 10;14(4):e11232. doi: 10.1002/ece3.11232. eCollection 2024 Apr.
2
Portable nanopore-sequencing technology: Trends in development and applications.便携式纳米孔测序技术:发展趋势与应用
Front Microbiol. 2023 Feb 1;14:1043967. doi: 10.3389/fmicb.2023.1043967. eCollection 2023.

本文引用的文献

1
Dynamic Pooling Improves Nanopore Base Calling Accuracy.动态合并提高纳米孔碱基识别准确性。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3416-3424. doi: 10.1109/TCBB.2021.3128366. Epub 2022 Dec 8.
2
Simulation of Nanopore Sequencing Signals Based on BiGRU.基于 BiGRU 的纳米孔测序信号模拟。
Sensors (Basel). 2020 Dec 17;20(24):7244. doi: 10.3390/s20247244.
3
Nanopore basecalling from a perspective of instance segmentation.基于实例分割的纳米孔碱基调用。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):136. doi: 10.1186/s12859-020-3459-0.
4
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network.因果呼叫:使用时间卷积网络的纳米孔碱基识别
Front Genet. 2020 Jan 20;10:1332. doi: 10.3389/fgene.2019.01332. eCollection 2019.
5
DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing.DeepSimulator1.5:一款更强大、更快速、更轻量级的纳米孔测序模拟软件。
Bioinformatics. 2020 Apr 15;36(8):2578-2580. doi: 10.1093/bioinformatics/btz963.
6
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.
7
From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.从扭曲到碱基对:提高纳米孔测序读取准确性的计算方法。
Genome Biol. 2018 Jul 13;19(1):90. doi: 10.1186/s13059-018-1462-9.
8
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
9
DeepSimulator: a deep simulator for Nanopore sequencing.深模拟器:一种用于纳米孔测序的深度模拟器。
Bioinformatics. 2018 Sep 1;34(17):2899-2908. doi: 10.1093/bioinformatics/bty223.
10
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.奇龙:利用深度学习将纳米孔原始信号直接转换为核苷酸序列。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy037.