BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.
BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.
BMC Bioinformatics. 2021 Mar 25;22(1):158. doi: 10.1186/s12859-021-04081-z.
Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly.
In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder .
SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
具有远程共条形码信息的合成长读长(SLR)现在广泛应用于基因组学研究。尽管已经为每种特定的 SLR 技术开发了几个工具,但仍需要一个高效的稳健独立支架来进行混合基因组组装。
在这项工作中,我们开发了一种独立的支架工具 SLR-superscaffolder,用于使用共条形码和配对末端读取信息将草图组装中的 contigs 连接在一起。我们的自顶向下方案首先基于 Jaccard 相似性构建全局支架图,以确定 contigs 的顺序和方向,然后借助配对末端信息局部改进支架。我们还利用筛选算法来减少输入组装中错误组装 contigs 的负面影响。我们将 SLR-superscaffolder 应用于人类单管长片段读取测序数据集,并将其对应草图组装的支架 NG50 提高了 1349 倍。此外,在不同输入 contigs 上的基准测试表明,该方法总体上优于现有的 SLR 支架器,提供了更长的连续性和更少的错误组装,特别是对于由下一代测序数据组装的短 contigs。SLR-superscaffolder 的开源代码可在 https://github.com/BGI-Qingdao/SLR-superscaffolder 上获得。
SLR-superscaffolder 通过整合混合组装策略,可以显著提高草图组装的连续性。