推进宏基因组组装基因组为基础的病原体鉴定：揭示长读长组装算法在牛津纳米孔测序中的强大功能。

Advancing metagenome-assembled genome-based pathogen identification: unraveling the power of long-read assembly algorithms in Oxford Nanopore sequencing.

机构信息

Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, University of Maryland, College Park, Maryland, USA.

Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, USA.

出版信息

Microbiol Spectr. 2024 Jun 4;12(6):e0011724. doi: 10.1128/spectrum.00117-24. Epub 2024 Apr 30.

DOI:10.1128/spectrum.00117-24

PMID:38687063

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11237517/

Abstract

Oxford Nanopore sequencing is one of the high-throughput sequencing technologies that facilitates the reconstruction of metagenome-assembled genomes (MAGs). This study aimed to assess the potential of long-read assembly algorithms in Oxford Nanopore sequencing to enhance the MAG-based identification of bacterial pathogens using both simulated and mock communities. Simulated communities were generated to mimic those on fresh spinach and in surface water. Long reads were produced using R9.4.1+SQK-LSK109 and R10.4 + SQK-LSK112, with 0.5, 1, and 2 million reads. The simulated bacterial communities included multidrug-resistant serotypes Heidelberg, Montevideo, and Typhimurium in the fresh spinach community individually or in combination, as well as multidrug-resistant in the surface water community. Real data sets of the ZymoBIOMICS HMW DNA Standard were also studied. A bioinformatic pipeline (MAGenie, freely available at https://github.com/jackchen129/MAGenie) that combines metagenome assembly, taxonomic classification, and sequence extraction was developed to reconstruct draft MAGs from metagenome assemblies. Five assemblers were evaluated based on a series of genomic analyses. Overall, Flye outperformed the other assemblers, followed by Shasta, Raven, and Unicycler, while Canu performed least effectively. In some instances, the extracted sequences resulted in draft MAGs and provided the locations and structures of antimicrobial resistance genes and mobile genetic elements. Our study showcases the viability of utilizing the extracted sequences for precise phylogenetic inference, as demonstrated by the consistent alignment of phylogenetic topology between the reference genome and the extracted sequences. R9.4.1+SQK-LSK109 was more effective in most cases than R10.4+SQK-LSK112, and greater sequencing depths generally led to more accurate results.IMPORTANCEBy examining diverse bacterial communities, particularly those housing multiple serotypes, this study holds significance in uncovering the potential of long-read assembly algorithms to improve metagenome-assembled genome (MAG)-based pathogen identification through Oxford Nanopore sequencing. Our research demonstrates that long-read assembly stands out as a promising avenue for boosting precision in MAG-based pathogen identification, thus advancing the development of more robust surveillance measures. The findings also support ongoing endeavors to fine-tune a bioinformatic pipeline for accurate pathogen identification within complex metagenomic samples.

摘要

牛津纳米孔测序是高通量测序技术之一，有助于重建宏基因组组装基因组 (MAG)。本研究旨在评估长读长组装算法在牛津纳米孔测序中的潜力，以利用基于 MAG 的方法提高对细菌病原体的鉴定，同时使用模拟和模拟群落进行研究。模拟群落是为了模拟新鲜菠菜和地表水而生成的。使用 R9.4.1+SQK-LSK109 和 R10.4+SQK-LSK112 产生长读长，每个样本的读长数为 0.5、1 和 200 万。模拟细菌群落包括新鲜菠菜群落中单独或组合存在的多重耐药型海德堡、蒙得维的亚和 Typhimurium 血清型，以及地表水群落中的多重耐药型。还研究了 ZymoBIOMICS HMW DNA Standard 的真实数据集。开发了一个生物信息学管道 (MAGenie，可在 https://github.com/jackchen129/MAGenie 上免费获得)，该管道结合了宏基因组组装、分类学分类和序列提取，用于从宏基因组组装中重建草稿 MAG。基于一系列基因组分析，评估了五个组装器。总体而言，Flye 的表现优于其他组装器，其次是 Shasta、Raven 和 Unicycler，而 Canu 的表现最差。在某些情况下，提取的序列导致草稿 MAG，并提供了抗生素耐药基因和移动遗传元件的位置和结构。我们的研究展示了利用提取的序列进行精确系统发育推断的可行性，这从参考基因组和提取序列之间的系统发育拓扑的一致对齐得到证明。在大多数情况下，R9.4.1+SQK-LSK109 比 R10.4+SQK-LSK112 更有效，并且更大的测序深度通常会导致更准确的结果。

重要性

通过检查多种细菌群落，特别是那些容纳多个型血清的群落，本研究揭示了长读长组装算法在通过牛津纳米孔测序提高基于宏基因组组装基因组 (MAG) 的病原体鉴定中的潜力。我们的研究表明，长读长组装是提高 MAG 基于病原体鉴定精度的有前途的途径，从而推进更强大的监测措施的发展。研究结果还支持为在复杂宏基因组样本中进行准确病原体鉴定而对生物信息学管道进行微调的持续努力。