Abdel-Glil Mostafa Y, Brandt Christian, Pletz Mathias W, Neubauer Heinrich, Sprague Lisa D
Institute of Bacterial Infections and Zoonoses, Friedrich-Loeffler-Institut, Naumburger Str. 96A, 07743 Jena, Germany.
Institute for Infectious Diseases and Infection Control, Jena University Hospital - Friedrich Schiller University, Jena, Germany.
Microb Genom. 2025 Mar;11(3). doi: 10.1099/mgen.0.001372.
Nanopore sequencing is a third-generation technology known for its portability, real-time analysis and ability to generate long reads. It has great potential for use in clinical diagnostics, but thorough validation is required to address accuracy concerns and ensure reliable and reproducible results. In this study, we automated an open-source workflow (freely available at https://gitlab.com/FLI_Bioinfo/nanobacta) for the assembly of Oxford Nanopore sequencing data and used it to investigate the reproducibility of assembly results under consistent conditions. We used a benchmark dataset of five bacterial reference strains and generated eight technical sequencing replicates of the same DNA using the Ligation and Rapid Barcoding kits together with the Flongle and MinION flow cells. We assessed reproducibility by measuring discrepancies such as substitution and insertion/deletion errors, analysing plasmid recovery results and examining genetic markers and clustering information. We compared the results of genome assemblies with and without short-read polishing. Our results show an average reproducibility accuracy of 99.999955% for nanopore-only assemblies and 99.999996% when the short reads were used for polishing. The genomic analysis results were highly reproducible for the nanopore-only assemblies without short read in the following areas: identification of genetic markers for antimicrobial resistance and virulence, classical MLST, taxonomic classification, genome completeness and contamination analysis. Interestingly, the clustering information results from the core genome SNP and core genome MLST analyses were also highly reproducible for the nanopore-only assemblies, with pairwise differences of up to two allele differences in core genome MLST and two SNPs in core genome SNP across replicates. After polishing the assemblies with short reads, the pairwise differences for cgMLST were 0 and for cgSNP were 0-1 SNP across replicates. These results highlight the advances in sequencing accuracy of nanopore data without the use of short reads.
纳米孔测序是一种第三代技术,以其便携性、实时分析能力以及生成长读长的能力而闻名。它在临床诊断中具有巨大的应用潜力,但需要进行全面验证以解决准确性问题并确保结果可靠且可重复。在本研究中,我们自动化了一个用于组装牛津纳米孔测序数据的开源工作流程(可在https://gitlab.com/FLI_Bioinfo/nanobacta免费获取),并使用它来研究在一致条件下组装结果的可重复性。我们使用了五个细菌参考菌株的基准数据集,并使用连接和快速条形码试剂盒以及Flongle和MinION流动槽对同一DNA生成了八个技术测序重复。我们通过测量诸如替换和插入/缺失错误等差异、分析质粒回收结果以及检查遗传标记和聚类信息来评估可重复性。我们比较了有无短读长抛光的基因组组装结果。我们的结果表明,仅使用纳米孔组装的平均可重复性准确率为99.999955%,使用短读长进行抛光时为99.999996%。在以下方面,仅使用纳米孔组装且无短读长的基因组分析结果具有高度可重复性:抗菌耐药性和毒力遗传标记的鉴定、经典多位点序列分型、分类学分类、基因组完整性和污染分析。有趣的是,核心基因组SNP和核心基因组MLST分析的聚类信息结果对于仅使用纳米孔组装的情况也具有高度可重复性,在重复样本中,核心基因组MLST的等位基因差异最多为两个,核心基因组SNP的SNP差异为两个。在用短读长对组装进行抛光后,重复样本中cgMLST的成对差异为0,cgSNP的成对差异为0至1个SNP。这些结果突出了不使用短读长时纳米孔数据测序准确性的进步。