Zhang Zhenmiao, Yang Chao, Veldsman Werner Pieter, Fang Xiaodong, Zhang Lu
Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
BGI Genomics, BGI-Shenzhen, Shenzhen, China.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad087.
Metagenome assembly is an efficient approach to reconstruct microbial genomes from metagenomic sequencing data. Although short-read sequencing has been widely used for metagenome assembly, linked- and long-read sequencing have shown their advancements in assembly by providing long-range DNA connectedness. Many metagenome assembly tools were developed to simplify the assembly graphs and resolve the repeats in microbial genomes. However, there remains no comprehensive evaluation of metagenomic sequencing technologies, and there is a lack of practical guidance on selecting the appropriate metagenome assembly tools. This paper presents a comprehensive benchmark of 19 commonly used assembly tools applied to metagenomic sequencing datasets obtained from simulation, mock communities or human gut microbiomes. These datasets were generated using mainstream sequencing platforms, such as Illumina and BGISEQ short-read sequencing, 10x Genomics linked-read sequencing, and PacBio and Oxford Nanopore long-read sequencing. The assembly tools were extensively evaluated against many criteria, which revealed that long-read assemblers generated high contig contiguity but failed to reveal some medium- and high-quality metagenome-assembled genomes (MAGs). Linked-read assemblers obtained the highest number of overall near-complete MAGs from the human gut microbiomes. Hybrid assemblers using both short- and long-read sequencing were promising methods to improve both total assembly length and the number of near-complete MAGs. This paper also discussed the running time and peak memory consumption of these assembly tools and provided practical guidance on selecting them.
宏基因组组装是一种从宏基因组测序数据中重建微生物基因组的有效方法。尽管短读长测序已广泛用于宏基因组组装,但连接读长测序和长读长测序通过提供长距离DNA连通性在组装方面显示出其优势。许多宏基因组组装工具被开发出来,以简化组装图并解决微生物基因组中的重复问题。然而,目前仍缺乏对宏基因组测序技术的全面评估,并且在选择合适的宏基因组组装工具方面缺乏实际指导。本文对应用于从模拟、模拟群落或人类肠道微生物组获得的宏基因组测序数据集的19种常用组装工具进行了全面的基准测试。这些数据集是使用主流测序平台生成的,如Illumina和BGISEQ短读长测序、10x Genomics连接读长测序以及PacBio和Oxford Nanopore长读长测序。根据许多标准对组装工具进行了广泛评估,结果表明长读长组装器产生的重叠群连续性高,但未能揭示一些中等质量和高质量的宏基因组组装基因组(MAG)。连接读长组装器从人类肠道微生物组中获得的近乎完整的MAG总数最多。使用短读长和长读长测序的混合组装器是提高总组装长度和近乎完整MAG数量的有前景的方法。本文还讨论了这些组装工具的运行时间和峰值内存消耗,并为选择它们提供了实际指导。