• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NanoReviser:一种基于深度学习算法的纳米孔测序纠错工具。

NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.

作者信息

Wang Luotong, Qu Li, Yang Longshu, Wang Yiying, Zhu Huaiqiu

机构信息

State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, China.

Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States.

出版信息

Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020.

DOI:10.3389/fgene.2020.00900
PMID:32903372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7434944/
Abstract

Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser.

摘要

纳米孔测序被视为最具前景的第三代测序(TGS)技术之一。自2014年以来,牛津纳米孔技术公司(ONT)已开发出一系列基于纳米孔测序的设备,以产生非常长的读段,有望对基因组学产生影响。然而,由于难以从复杂的电信号中识别DNA碱基,纳米孔测序读段容易出现相当高的错误率。尽管在过去几年中已经为纳米孔测序开发了几种碱基识别工具,但在应用碱基识别程序后校正序列仍然具有挑战性。在本研究中,我们基于深度学习算法开发了一个开源的DNA碱基识别校正器NanoReviser,以校正默认提供的当前碱基识别器引入的碱基识别错误。在我们的模块中,我们根据默认碱基识别器提供的碱基识别序列对原始电信号进行重新分割。通过使用卷积神经网络(CNN)和双向长短期记忆(Bi-LSTM)网络,我们利用了原始电信号和碱基识别器的碱基识别序列中的信息。我们的结果表明,作为一种碱基识别后校正器,NanoReviser显著提高了碱基识别质量。在使用来自公共和人类NA12878数据集的标准ONT测序读段进行训练后,NanoReviser将数据集和人类数据集的测序错误率均降低了5%以上。发现NanoReviser的性能优于所有当前的碱基识别工具。此外,我们分析了数据集的修饰碱基,并添加甲基化信息来训练我们的模块。通过甲基化注释,NanoReviser将数据集的错误率降低了7%,并特别将富含甲基化碱基的序列区域的错误率降低了10%以上。据我们所知,NanoReviser是碱基识别后的第一个后处理工具,无需构建一致序列的耗时过程即可准确校正纳米孔序列。NanoReviser软件包可在https://github.com/pkubioinformatics/NanoReviser上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/db99dc66ae85/fgene-11-00900-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/e4f97d71fde3/fgene-11-00900-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/2d1ff7321b90/fgene-11-00900-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/db99dc66ae85/fgene-11-00900-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/e4f97d71fde3/fgene-11-00900-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/2d1ff7321b90/fgene-11-00900-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abad/7434944/db99dc66ae85/fgene-11-00900-g003.jpg

相似文献

1
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.NanoReviser:一种基于深度学习算法的纳米孔测序纠错工具。
Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020.
2
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.
3
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.SACall:基于自注意力机制的牛津纳米孔测序数据的神经网络碱基调用程序。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3.
4
Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing.用于 Oxford Nanopore 测序的碱基调用工具的核苷酸重建质量符号估计。
Sensors (Basel). 2023 Jul 29;23(15):6787. doi: 10.3390/s23156787.
5
Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing.使用联合原始和事件纳米孔数据序列到序列处理进行碱基调用。
Sensors (Basel). 2022 Mar 15;22(6):2275. doi: 10.3390/s22062275.
6
Nanopore basecalling from a perspective of instance segmentation.基于实例分割的纳米孔碱基调用。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):136. doi: 10.1186/s12859-020-3459-0.
7
RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data.RODAN:一种用于纳米孔 RNA 测序数据碱基调用的全卷积架构。
BMC Bioinformatics. 2022 Apr 20;23(1):142. doi: 10.1186/s12859-022-04686-y.
8
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants.物种特异性碱基识别器提高了植物纳米孔测序的实际准确性。
Plant Methods. 2022 Dec 14;18(1):137. doi: 10.1186/s13007-022-00971-2.
9
Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network.因果呼叫:使用时间卷积网络的纳米孔碱基识别
Front Genet. 2020 Jan 20;10:1332. doi: 10.3389/fgene.2019.01332. eCollection 2019.
10
RNA m6A detection using raw current signals and basecalling errors from Nanopore direct RNA sequencing reads.使用 Nanopore 直接 RNA 测序读取的原始电流信号和碱基调用错误检测 RNA m6A。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae375.

引用本文的文献

1
Comprehensive comparison of the third-generation sequencing tools for bacterial 6mA profiling.用于细菌6mA分析的第三代测序工具的综合比较。
Nat Commun. 2025 Apr 28;16(1):3982. doi: 10.1038/s41467-025-59187-2.
2
Sequencing and Optical Genome Mapping for the Adventurous Chemist.面向勇于探索的化学家的测序与光学基因组图谱技术。
Chem Biomed Imaging. 2024 Oct 25;2(12):784-807. doi: 10.1021/cbmi.4c00060. eCollection 2024 Dec 23.
3
Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates.

本文引用的文献

1
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.
2
Single-Molecule Sequencing: Towards Clinical Applications.单分子测序:迈向临床应用。
Trends Biotechnol. 2019 Jan;37(1):72-85. doi: 10.1016/j.tibtech.2018.07.013. Epub 2018 Aug 13.
3
From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.从扭曲到碱基对:提高纳米孔测序读取准确性的计算方法。
针对纳米孔组装的短读和长读抛光工具进行基准测试:实现暴发分离株的近乎完美基因组。
BMC Genomics. 2024 Jul 8;25(1):679. doi: 10.1186/s12864-024-10582-x.
4
Portable nanopore-sequencing technology: Trends in development and applications.便携式纳米孔测序技术:发展趋势与应用
Front Microbiol. 2023 Feb 1;14:1043967. doi: 10.3389/fmicb.2023.1043967. eCollection 2023.
5
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.
6
Tracing Viral Transmission and Evolution of Bovine Leukemia Virus through Long Read Oxford Nanopore Sequencing of the Proviral Genome.通过前病毒基因组的长读长牛津纳米孔测序追踪牛白血病病毒的病毒传播与进化
Pathogens. 2021 Sep 14;10(9):1191. doi: 10.3390/pathogens10091191.
7
Analyzing Modern Biomolecules: The Revolution of Nucleic-Acid Sequencing - Review.分析现代生物分子:核酸测序的革命——综述。
Biomolecules. 2021 Jul 28;11(8):1111. doi: 10.3390/biom11081111.
8
Nanopore sequencing reveals full-length Tropomyosin 1 isoforms and their regulation by RNA-binding proteins during rat heart development.纳米孔测序揭示了大鼠心脏发育过程中全长原肌球蛋白1异构体及其受RNA结合蛋白的调控。
J Cell Mol Med. 2021 Sep;25(17):8352-8362. doi: 10.1111/jcmm.16795. Epub 2021 Jul 24.
9
MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction.MicroPIPE:验证用于高质量完整细菌基因组构建的端到端工作流程。
BMC Genomics. 2021 Jun 25;22(1):474. doi: 10.1186/s12864-021-07767-z.
10
Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads.评估使用长读长和短读长的准宏基因组样本中单核细胞增生李斯特菌组装的准确性。
BMC Genomics. 2021 May 26;22(1):389. doi: 10.1186/s12864-021-07702-2.
Genome Biol. 2018 Jul 13;19(1):90. doi: 10.1186/s13059-018-1462-9.
4
Long reads: their purpose and place.长读序列:它们的用途和位置。
Hum Mol Genet. 2018 Aug 1;27(R2):R234-R241. doi: 10.1093/hmg/ddy177.
5
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.奇龙:利用深度学习将纳米孔原始信号直接转换为核苷酸序列。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy037.
6
Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.
7
Next-generation sequencing technologies and their application to the study and control of bacterial infections.下一代测序技术及其在细菌感染研究和控制中的应用。
Clin Microbiol Infect. 2018 Apr;24(4):335-341. doi: 10.1016/j.cmi.2017.10.013. Epub 2017 Oct 23.
8
A world of opportunities with nanopore sequencing.纳米孔测序的广阔天地。
J Exp Bot. 2017 Nov 28;68(20):5419-5429. doi: 10.1093/jxb/erx289.
9
MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry.MinION分析与参考联盟:R9.0化学的第二阶段数据发布与分析
F1000Res. 2017 May 31;6:760. doi: 10.12688/f1000research.11354.1. eCollection 2017.
10
DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads.DeepNano:用于MinION纳米孔测序读数碱基识别的深度循环神经网络
PLoS One. 2017 Jun 5;12(6):e0178751. doi: 10.1371/journal.pone.0178751. eCollection 2017.