Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
Centre for Pathogen Genomics, The University of Melbourne, Melbourne, Australia.
Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.
Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance detection. This study presents a comprehensive benchmarking of variant calling accuracy in bacterial genomes using Oxford Nanopore Technologies (ONT) sequencing data. We evaluated three ONT basecalling models and both simplex (single-strand) and duplex (dual-strand) read types across 14 diverse bacterial species. Our findings reveal that deep learning-based variant callers, particularly Clair3 and DeepVariant, significantly outperform traditional methods and even exceed the accuracy of Illumina sequencing, especially when applied to ONT's super-high accuracy model. ONT's superior performance is attributed to its ability to overcome Illumina's errors, which often arise from difficulties in aligning reads in repetitive and variant-dense genomic regions. Moreover, the use of high-performing variant callers with ONT's super-high accuracy data mitigates ONT's traditional errors in homopolymers. We also investigated the impact of read depth on variant calling, demonstrating that 10× depth of ONT super-accuracy data can achieve precision and recall comparable to, or better than, full-depth Illumina sequencing. These results underscore the potential of ONT sequencing, combined with advanced variant calling algorithms, to replace traditional short-read sequencing methods in bacterial genomics, particularly in resource-limited settings.
变异调用在细菌基因组学中至关重要,它为疾病传播群的鉴定、系统发育树的构建以及抗菌药物耐药性的检测提供了基础。本研究全面评估了基于牛津纳米孔技术(ONT)测序数据的细菌基因组变异调用准确性。我们评估了三种 ONT 碱基调用模型,以及单链和双链两种读取类型在 14 种不同细菌物种中的表现。我们的研究结果表明,基于深度学习的变异调用器,特别是 Clair3 和 DeepVariant,显著优于传统方法,甚至超过了 Illumina 测序的准确性,尤其是在应用于 ONT 的超高准确性模型时。ONT 的卓越性能归因于其克服 Illumina 错误的能力,而 Illumina 错误通常源于在重复和变异丰富的基因组区域中对齐读取的困难。此外,使用高性能的变异调用器和 ONT 的超高准确性数据,可以减轻 ONT 在长串联重复区域的传统错误。我们还研究了读取深度对变异调用的影响,结果表明,ONT 超高准确性数据的 10 倍深度可以达到与全深度 Illumina 测序相当或更好的精度和召回率。这些结果突出了 ONT 测序与先进的变异调用算法相结合,在细菌基因组学中替代传统短读测序方法的潜力,尤其是在资源有限的环境中。