Department of Microbial and Molecular Systems, KU Leuven, 3001 Leuven, Belgium.
Department of Biosystems, KU Leuven, 3001 Leuven, Belgium.
Bioinformatics. 2022 May 13;38(10):2920-2921. doi: 10.1093/bioinformatics/btac208.
Missing regions in short-read assemblies of prokaryote genomes are often attributed to biases in sequencing technologies and to repetitive elements, the former resulting in low sequencing coverage of certain loci and the latter to unresolved loops in the de novo assembly graph. We developed SASpector, a command-line tool that compares short-read assemblies (draft genomes) to their corresponding closed assemblies and extracts missing regions to analyze them at the sequence and functional level. SASpector allows to benchmark the need for resolved genomes, can be integrated into pipelines to control the quality of assemblies, and could be used for comparative investigations of missingness in assemblies for which both short-read and long-read data are available in the public databases.
SASpector is available at https://github.com/LoGT-KULeuven/SASpector. The tool is implemented in Python3 and available through pip and Docker (0mician/saspector).
Supplementary data are available at Bioinformatics online.
原核生物基因组短读序列组装中缺失的区域通常归因于测序技术的偏倚和重复元件,前者导致某些基因座的测序覆盖度低,后者导致从头组装图中未解决的环。我们开发了 SASpector,这是一个命令行工具,它将短读序列组装(草图基因组)与其对应的闭合组装进行比较,并提取缺失区域,以在序列和功能水平上对其进行分析。SASpector 可以用于基准化解析基因组的需求,可集成到管道中以控制组装的质量,并且可用于在公共数据库中同时具有短读和长读数据的情况下,对组装缺失进行比较研究。
SASpector 可在 https://github.com/LoGT-KULeuven/SASpector 上获得。该工具是用 Python3 实现的,可通过 pip 和 Docker(0mician/saspector)使用。
补充数据可在 Bioinformatics 在线获得。