Suppr超能文献

使用双标记德布鲁因图进行快速高效的Rmap组装。

Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph.

作者信息

Mukherjee Kingshuk, Rossi Massimiliano, Salmela Leena, Boucher Christina

机构信息

Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, USA.

Department of Computer Science, Helsinki Institute for Information Technology, HIIT, University of Helsinki, Helsinki, Finland.

出版信息

Algorithms Mol Biol. 2021 May 25;16(1):6. doi: 10.1186/s13015-021-00182-9.

Abstract

Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics' Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as RMAPPER, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770-15775, 2006) only successfully ran on E. coli. Moreover, on the human genome RMAPPER was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, RMAPPER is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper .

摘要

全基因组光学图谱是高分辨率的限制性图谱,它为基因组提供了独特的数字表示形式。它们是通过组装数十万单个分子光学图谱(称为Rmaps)而产生的。不幸的是,用于组装Rmap数据的选择非常少。只有一种公开可用的非专有组装方法和一种通过可执行文件提供的专有软件。此外,Valouev等人(《美国国家科学院院刊》103(43):15770 - 15775, 2006)提出的公开可用方法遵循重叠-布局-共识(OLC)范式,因此,对于相对较大的基因组无法进行扩展。专有方法Bionano Genomics的Solve背后的算法很大程度上不为人知。在本文中,我们将配对德布鲁因图中双标签的定义扩展到光学图谱数据的背景下,并提出了第一种基于德布鲁因图的Rmap组装方法。我们实现了我们的方法,称为RMAPPER,并将其性能与Valouev等人(《美国国家科学院院刊》103(43):15770 - 15775, 2006)的组装器以及Bionano Genomics的Solve在来自三个基因组的数据上进行比较:大肠杆菌、人类和攀鲈(龟壳攀鲈)。我们的方法能够在所有三个基因组上成功运行。Valouev等人(《美国国家科学院院刊》103(43):15770 - 15775, 2006)的方法仅在大肠杆菌上成功运行。此外,在人类基因组上,RMAPPER比Bionano Solve快至少130倍,使用的内存少五倍,并且产生的基因组片段最高且没有错误组装。我们的软件RMAPPER是用C++编写的,可在https://github.com/kingufl/Rmapper上根据GNU通用公共许可证公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b33e/8147420/e7f9909f2970/13015_2021_182_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验