Suppr超能文献

LR_Gapcloser:一种基于平铺路径的缺口闭合器,它使用长读长来完成基因组组装。

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.

机构信息

Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China.

College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China.

出版信息

Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.

Abstract

BACKGROUND

Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes.

FINDINGS

We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome.

CONCLUSIONS

LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.

摘要

背景

完成基因组是基因组组装的重要目标。然而,许多组装,包括参考组装,都是未完成的,并且存在许多缺口。来自第三代测序(TGS)平台的长读长可以帮助缩小这些缺口并提高组装连续性。然而,目前使用长读长的缺口闭合方法需要大量的运行时间和高内存使用。因此,需要一种使用长读长快速高效地获得完整基因组的方法。

发现

我们开发了 LR_Gapcloser 来快速有效地闭合基因组组装中的缺口。该工具利用来自 TGS 测序平台的长读长。在从头组装的缺口、重复衍生的缺口和真实的缺口上进行测试,LR_Gapcloser 以更低的错误率和更低的内存使用率,比两个现有的最先进的工具更快地闭合了更多的缺口。该工具利用原始读长填充了比使用纠错读长更多的缺口。它适用于不同方法和来自大而复杂基因组的组装中的缺口。使用该工具进行缺口闭合后,人类 CHM1 基因组的 contig N50 大小从 143 kb 提高到 19 Mb,提高了 132 倍。我们还闭合了富含重复的大型基因组 Triticum urartu 基因组中的缺口, contig N50 大小提高了 40%。此外,我们通过将最佳的 TGS 基于和下一代测序基于的组装器与 LR_Gapcloser 结合,评估了六种混合组装策略的连续性和正确性。提出并优化的混合策略生成了具有明显连续性的新人类 CHM1 基因组组装。contig N50 值大于 28 Mb,大于以前的二倍体人类基因组非参考组装。

结论

LR_Gapcloser 是一种快速有效的工具,可用于闭合缺口并提高基因组组装的连续性。包括该工具的提议的混合组装有望达到参考级别的组装。该软件可在 http://www.fishbrowser.org/software/LR_Gapcloser/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2904/6324547/d8471bcc21ff/giy157fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验