• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从纳米孔测序数据的FASTQ文件中恢复流动槽类型和碱基识别器配置。

Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data.

作者信息

Mencius Jun, Chen Wenjun, Zheng Youqi, An Tingyi, Yu Yongguo, Sun Kun, Feng Huijuan, Feng Zhixing

机构信息

Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China.

Department of Clinical Genetics, Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.

出版信息

Nat Commun. 2025 May 2;16(1):4102. doi: 10.1038/s41467-025-59378-x.

DOI:10.1038/s41467-025-59378-x
PMID:40316544
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12048652/
Abstract

As nanopore sequencing has been widely adopted, data accumulation has surged, resulting in over 700,000 public datasets. While these data hold immense potential for advancing genomic research, their utility is compromised by the absence of flowcell type and basecaller configuration in about 85% of the data and associated publications. These parameters are essential for many analysis algorithms, and their misapplication can lead to significant drops in performance. To address this issue, we present LongBow, designed to infer flowcell type and basecaller configuration directly from the base quality value patterns of FASTQ files. LongBow has been tested on 66 in-house basecalled FAST5/POD5 datasets and 1989 public FASTQ datasets, achieving accuracies of 95.33% and 91.45%, respectively. We demonstrate its utility by reanalyzing nanopore sequencing data from the COVID-19 Genomics UK (COG-UK) project. The results show that LongBow is essential for reproducing reported genomic variants and, through a LongBow-based analysis pipeline, we discovered substantially more functionally important variants while improving accuracy in lineage assignment. Overall, LongBow is poised to play a critical role in maximizing the utility of public nanopore sequencing data, while significantly enhancing the reproducibility of related research.

摘要

随着纳米孔测序技术的广泛应用,数据积累量激增,已产生超过70万个公共数据集。虽然这些数据在推动基因组研究方面具有巨大潜力,但约85%的数据及相关出版物中缺少流动槽类型和碱基识别器配置,这削弱了它们的实用性。这些参数对许多分析算法至关重要,其错误应用可能导致性能大幅下降。为解决这一问题,我们推出了LongBow,旨在直接从FASTQ文件的碱基质量值模式推断流动槽类型和碱基识别器配置。LongBow已在66个内部碱基识别的FAST5/POD5数据集和1989个公共FASTQ数据集上进行了测试,准确率分别达到95.33%和91.45%。我们通过重新分析英国新冠病毒基因组学(COG-UK)项目的纳米孔测序数据来证明其效用。结果表明,LongBow对于重现已报道的基因组变异至关重要,并且通过基于LongBow的分析流程,我们发现了更多功能上重要的变异,同时提高了谱系分配的准确性。总体而言,LongBow有望在最大化公共纳米孔测序数据的效用方面发挥关键作用,同时显著提高相关研究的可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/ce53944a2cc5/41467_2025_59378_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/5e328ef352b4/41467_2025_59378_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/61125bb6ff17/41467_2025_59378_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/c61e305567de/41467_2025_59378_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/ba339aa2590a/41467_2025_59378_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/556c12d7bb45/41467_2025_59378_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/5bd568eabe44/41467_2025_59378_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/ce53944a2cc5/41467_2025_59378_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/5e328ef352b4/41467_2025_59378_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/61125bb6ff17/41467_2025_59378_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/c61e305567de/41467_2025_59378_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/ba339aa2590a/41467_2025_59378_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/556c12d7bb45/41467_2025_59378_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/5bd568eabe44/41467_2025_59378_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08b0/12048652/ce53944a2cc5/41467_2025_59378_Fig7_HTML.jpg

相似文献

1
Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data.从纳米孔测序数据的FASTQ文件中恢复流动槽类型和碱基识别器配置。
Nat Commun. 2025 May 2;16(1):4102. doi: 10.1038/s41467-025-59378-x.
2
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.SACall:基于自注意力机制的牛津纳米孔测序数据的神经网络碱基调用程序。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3.
3
Analytical Performance of a Novel Nanopore Sequencing for SARS-CoV-2 Genomic Surveillance.一种用于SARS-CoV-2基因组监测的新型纳米孔测序的分析性能
J Med Virol. 2024 Dec;96(12):e70108. doi: 10.1002/jmv.70108.
4
Comparison of Illumina versus Nanopore 16S rRNA Gene Sequencing of the Human Nasal Microbiota.Illumina 与 Nanopore 16S rRNA 基因测序技术在人类鼻腔微生物组中的比较。
Genes (Basel). 2020 Sep 21;11(9):1105. doi: 10.3390/genes11091105.
5
A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing.基于 Oxford Nanopore 测序的甲基化检测的信号处理与深度学习框架。
Nat Commun. 2024 Feb 16;15(1):1448. doi: 10.1038/s41467-024-45778-y.
6
NanoCore: core-genome-based bacterial genomic surveillance and outbreak detection in healthcare facilities from Nanopore and Illumina data.NanoCore:基于核心基因组的细菌基因组监测和爆发检测,用于从 Nanopore 和 Illumina 数据的医疗保健设施中。
mSystems. 2024 Nov 19;9(11):e0108024. doi: 10.1128/msystems.01080-24. Epub 2024 Oct 7.
7
Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing.用于 Oxford Nanopore 测序的碱基调用工具的核苷酸重建质量符号估计。
Sensors (Basel). 2023 Jul 29;23(15):6787. doi: 10.3390/s23156787.
8
Demultiplexing and barcode-specific adaptive sampling for nanopore direct RNA sequencing.用于纳米孔直接RNA测序的解复用和条形码特异性自适应采样
Nat Commun. 2025 Apr 21;16(1):3742. doi: 10.1038/s41467-025-59102-9.
9
Flexible and efficient handling of nanopore sequencing signal data with slow5tools.使用 slow5tools 灵活高效地处理纳米孔测序信号数据。
Genome Biol. 2023 Apr 6;24(1):69. doi: 10.1186/s13059-023-02910-3.
10
Simulation of nanopore sequencing signal data with tunable parameters.可调参数的纳米孔测序信号数据模拟。
Genome Res. 2024 Jun 25;34(5):778-783. doi: 10.1101/gr.278730.123.

本文引用的文献

1
Prevalence of persistent SARS-CoV-2 in a large community surveillance study.大型社区监测研究中 SARS-CoV-2 的持续流行率。
Nature. 2024 Feb;626(8001):1094-1101. doi: 10.1038/s41586-024-07029-4. Epub 2024 Feb 21.
2
Symphonizing pileup and full-alignment for deep learning-based long-read variant calling.基于深度学习的长读变异调用的交响乐堆积和全对齐。
Nat Comput Sci. 2022 Dec;2(12):797-803. doi: 10.1038/s43588-022-00387-x. Epub 2022 Dec 19.
3
COV2Var, a function annotation database of SARS-CoV-2 genetic variation.COV2Var,一个 SARS-CoV-2 遗传变异的功能注释数据库。
Nucleic Acids Res. 2024 Jan 5;52(D1):D701-D713. doi: 10.1093/nar/gkad958.
4
Lineage replacement and evolution captured by 3 years of the United Kingdom Coronavirus (COVID-19) Infection Survey.通过英国冠状病毒(COVID-19)感染调查三年的数据,捕捉到了谱系替换和进化。
Proc Biol Sci. 2023 Oct 25;290(2009):20231284. doi: 10.1098/rspb.2023.1284. Epub 2023 Oct 18.
5
Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv.利用纳米蒙斯 v 从肿瘤/对照配对长读测序数据中精确刻画体细胞复杂结构变异。
Nucleic Acids Res. 2023 Aug 11;51(14):e74. doi: 10.1093/nar/gkad526.
6
Phylogenomic analyses provide insights into primate evolution.系统发生基因组分析为灵长类动物的进化提供了新视角。
Science. 2023 Jun 2;380(6648):913-924. doi: 10.1126/science.abn6919. Epub 2023 Jun 1.
7
The landscape of tolerated genetic variation in humans and primates.人类和灵长类动物中可耐受遗传变异的景观。
Science. 2023 Jun 2;380(6648):eabn8153. doi: 10.1126/science.abn8197.
8
NanoPack2: population-scale evaluation of long-read sequencing data.NanoPack2:长读测序数据的大规模评估。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad311.
9
Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction.比较 R9.4.1/Kit10 和 R10/Kit12 Oxford Nanopore 流动池和化学试剂在细菌基因组重建中的应用。
Microb Genom. 2023 Jan;9(1). doi: 10.1099/mgen.0.000910.
10
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.