Hallgren Malte B, Clausen Philip T L C, Aarestrup Frank M
National Food Institute, Technical University of Denmark, Kemitorvet 204, 2800, Kgs. Lyngby, Denmark.
Biol Methods Protoc. 2024 Aug 6;9(1):bpae057. doi: 10.1093/biomethods/bpae057. eCollection 2024.
Rapid advancements in sequencing technologies have led to significant progress in microbial genomics, yet challenges persist in accurately identifying microbial strain diversity in metagenomic samples, especially when working with noisy long-read data from platforms like Oxford Nanopore Technologies (ONT). In this article, we introduce NanoMGT, a tool designed to enhance marker gene typing in low-complexity mono-species samples, leveraging the unique properties of long reads. NanoMGT excels in its ability to accurately identify mutations amidst high error rates, ensuring the reliable detection of multiple strain-specific marker genes. Our tool implements a novel scoring system that rewards mutations co-occurring across different reads and penalizes densely grouped, likely erroneous variants, thereby achieving a good balance between sensitivity and precision. A comparative evaluation of NanoMGT, using a simulated multi-strain sample of seven bacterial species, demonstrated superior performance relative to existing tools and the advantages of using a threshold-based filtering approach to calling minority variants in ONT's sequencing data. NanoMGT's potential as a post-binning tool in metagenomic pipelines is particularly notable, enabling researchers to more accurately determine specific alleles and understand strain diversity in microbial communities. Our findings have significant implications for clinical diagnostics, environmental microbiology, and the broader field of genomics. The findings offer a reliable and efficient approach to marker gene typing in complex metagenomic samples.
测序技术的快速发展推动了微生物基因组学的显著进步,但在准确识别宏基因组样本中的微生物菌株多样性方面仍存在挑战,尤其是处理来自牛津纳米孔技术公司(ONT)等平台的有噪声长读长数据时。在本文中,我们介绍了NanoMGT,这是一种利用长读长的独特特性来增强低复杂度单物种样本中标记基因分型的工具。NanoMGT在高错误率情况下准确识别突变的能力出色,确保可靠检测多个菌株特异性标记基因。我们的工具实施了一种新颖的评分系统,对不同读长中共同出现的突变给予奖励,对密集分组、可能错误的变异进行惩罚,从而在灵敏度和精度之间实现了良好平衡。使用七种细菌物种的模拟多菌株样本对NanoMGT进行的比较评估表明,相对于现有工具,其性能更优,且在ONT测序数据中使用基于阈值的过滤方法来调用少数变异具有优势。NanoMGT作为宏基因组流程中后分箱工具的潜力尤为显著,使研究人员能够更准确地确定特定等位基因并了解微生物群落中的菌株多样性。我们的研究结果对临床诊断、环境微生物学及更广泛的基因组学领域具有重要意义。这些发现为复杂宏基因组样本中的标记基因分型提供了一种可靠且高效的方法。