Greenman Noah, Hassouneh Sayf Al-Deen, Abdelli Latifa S, Johnston Catherine, Azarian Taj
College of Medicine, University of Central Florida, Orlando, FL 32827, USA.
Department of Health Science, College of Health Professions and Sciences, University of Central Florida, Orlando, FL 32816, USA.
Microorganisms. 2024 May 4;12(5):935. doi: 10.3390/microorganisms12050935.
Metagenomic sequencing analysis is central to investigating microbial communities in clinical and environmental studies. Short-read sequencing remains the primary approach for metagenomic research; however, long-read sequencing may offer advantages of improved metagenomic assembly and resolved taxonomic identification. To compare the relative performance for metagenomic studies, we simulated short- and long-read datasets using increasingly complex metagenomes comprising 10, 20, and 50 microbial taxa. Additionally, we used an empirical dataset of paired short- and long-read data generated from mouse fecal pellets to assess real-world performance. We compared metagenomic assembly quality, taxonomic classification, and metagenome-assembled genome (MAG) recovery rates. We show that long-read sequencing data significantly improve taxonomic classification and assembly quality. Metagenomic assemblies using simulated long reads were more complete and more contiguous with higher rates of MAG recovery. This resulted in more precise taxonomic classifications. Principal component analysis of empirical data demonstrated that sequencing technology affects compositional results as samples clustered by sequence type, not sample type. Overall, we highlight strengths of long-read metagenomic sequencing for microbiome studies, including improving the accuracy of classification and relative abundance estimates. These results will aid researchers when considering which sequencing approaches to use for metagenomic projects.
宏基因组测序分析是临床和环境研究中调查微生物群落的核心。短读长测序仍然是宏基因组研究的主要方法;然而,长读长测序可能在改善宏基因组组装和解析分类鉴定方面具有优势。为了比较宏基因组研究的相对性能,我们使用包含10、20和50个微生物分类群的日益复杂的宏基因组模拟了短读长和长读长数据集。此外,我们使用从小鼠粪便颗粒生成的短读长和长读长配对数据的实证数据集来评估实际性能。我们比较了宏基因组组装质量、分类学分类和宏基因组组装基因组(MAG)回收率。我们表明,长读长测序数据显著提高了分类学分类和组装质量。使用模拟长读长的宏基因组组装更完整、更连续,MAG回收率更高。这导致了更精确的分类学分类。实证数据的主成分分析表明,测序技术会影响组成结果,因为样本按序列类型而非样本类型聚类。总体而言,我们强调了长读长宏基因组测序在微生物组研究中的优势,包括提高分类准确性和相对丰度估计。这些结果将有助于研究人员在考虑用于宏基因组项目的测序方法时做出决策。