Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany.
German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany.
BMC Genomics. 2021 Nov 14;22(1):822. doi: 10.1186/s12864-021-08115-x.
We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation "short-read" and third-generation "long-read" sequencing methods.
We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a "long-read first" approach.
Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis.
我们针对短读长读和混合组装器的测序技术和组装策略进行了基准测试,以评估其在弗朗西斯氏菌属土拉菌基因组中的组装正确性、连续性和完整性。基准测试允许对弗朗西斯氏菌致病性岛和插入序列的基因组结构进行深入分析。应用了五种主要的高通量测序技术,包括新一代“短读”和第三代“长读”测序方法。
我们专注于短读组装器、混合组装器以及插入序列和弗朗西斯氏菌致病性岛的基因组结构分析。在八项短读组装方法中,A5-miseq 管道在 MiSeq 数据方面表现最佳,Mira 在 Ion Torrent 数据方面表现最佳,而 ABySS 在 HiSeq 数据方面表现最佳。我们采用了两种方法来基准测试长读和混合组装策略:首先进行长读组装,然后用短读进行校正(Canu/Pilon、Flye/Pilon),以及首先进行短读组装,然后基于长读进行支架构建(Unicyler、SPAdes)。混合组装可以通过“长读优先”的方法最好地解决大型重复区域。
弗朗西斯氏菌致病性岛的基因组结构经常出现组装错误。插入序列 (IS) 可用于进行进化保守性分析。插入序列的系统发育结构和进化在进化枝内阐明了高度保守的土拉弗朗西斯菌的进化枝结构。