Faculty of Arts and Science, Department of Biology, Bolu Abant İzzet Baysal University, Bolu, Turkey.
Polar Terrestrial Environmental Systems, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Potsdam, Germany.
Folia Microbiol (Praha). 2022 Oct;67(5):801-810. doi: 10.1007/s12223-022-00980-7. Epub 2022 Jun 6.
Next-generation sequencing methods provide comprehensive data for the analysis of structural and functional analysis of the genome. The draft genomes with low contig number and high N50 value can give insight into the structure of the genome as well as provide information on the annotation of the genome. In this study, we designed a pipeline that can be used to assemble prokaryotic draft genomes with low number of contigs and high N50 value. We aimed to use combination of two de novo assembly tools (SPAdes and IDBA-Hybrid) and evaluate the impact of this approach on the quality metrics of the assemblies. The followed pipeline was tested with the raw sequence data with short reads (< 300) for a total of 10 species from four different genera. To obtain the final draft genomes, we firstly assembled the sequences using SPAdes to find closely related organism using the extracted 16 s rRNA from it. IDBA-Hybrid assembler was used to obtain the second assembly data using the closely related organism genome. SPAdes assembler tool was implemented using the second assembly, produced by IDBA-hybrid as a hint. The results were evaluated using QUAST and BUSCO. The pipeline was successful for the reduction of the contig numbers and increasing the N50 statistical values in the draft genome assemblies while preserving the coverage of the draft genomes.
下一代测序方法为基因组的结构和功能分析提供了全面的数据。具有低 contig 数量和高 N50 值的草图基因组可以深入了解基因组的结构,并提供基因组注释的信息。在这项研究中,我们设计了一个可以用于组装具有低 contig 数量和高 N50 值的原核草图基因组的管道。我们旨在使用两种从头组装工具(SPAdes 和 IDBA-Hybrid)的组合,并评估这种方法对组装质量指标的影响。该流水线使用总共有 10 个来自四个不同属的 10 个物种的短读 (<300) 原始序列数据进行了测试。为了获得最终的草图基因组,我们首先使用 SPAdes 组装序列,从其中提取 16s rRNA 以找到与之密切相关的生物。然后使用 IDBA-Hybrid 组装器使用密切相关的生物基因组获得第二个组装数据。SPAdes 组装工具使用 IDBA-hybrid 生成的第二个组装作为提示来实现。使用 QUAST 和 BUSCO 对结果进行评估。该流水线成功地减少了草图基因组组装中的 contig 数量并增加了 N50 统计值,同时保持了草图基因组的覆盖度。