Suppr超能文献

SLR-superscaffolder:一种从头至尾方案的用于合成长读长的从头拼接工具。

SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme.

机构信息

BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.

BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.

出版信息

BMC Bioinformatics. 2021 Mar 25;22(1):158. doi: 10.1186/s12859-021-04081-z.

Abstract

BACKGROUND

Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly.

RESULTS

In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder .

CONCLUSIONS

SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.

摘要

背景

具有远程共条形码信息的合成长读长(SLR)现在广泛应用于基因组学研究。尽管已经为每种特定的 SLR 技术开发了几个工具,但仍需要一个高效的稳健独立支架来进行混合基因组组装。

结果

在这项工作中,我们开发了一种独立的支架工具 SLR-superscaffolder,用于使用共条形码和配对末端读取信息将草图组装中的 contigs 连接在一起。我们的自顶向下方案首先基于 Jaccard 相似性构建全局支架图,以确定 contigs 的顺序和方向,然后借助配对末端信息局部改进支架。我们还利用筛选算法来减少输入组装中错误组装 contigs 的负面影响。我们将 SLR-superscaffolder 应用于人类单管长片段读取测序数据集,并将其对应草图组装的支架 NG50 提高了 1349 倍。此外,在不同输入 contigs 上的基准测试表明,该方法总体上优于现有的 SLR 支架器,提供了更长的连续性和更少的错误组装,特别是对于由下一代测序数据组装的短 contigs。SLR-superscaffolder 的开源代码可在 https://github.com/BGI-Qingdao/SLR-superscaffolder 上获得。

结论

SLR-superscaffolder 通过整合混合组装策略,可以显著提高草图组装的连续性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e0d/7993450/b4a567e5ab62/12859_2021_4081_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验