Department of Computer and Information Science and Engineering, College of Engineering, University of Florida, Gainesville, USA.
Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
Bioinformatics. 2019 Sep 15;35(18):3250-3256. doi: 10.1093/bioinformatics/btz069.
Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps-called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself.
We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data.
The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph.
Supplementary data are available at Bioinformatics online.
光学图谱是高分辨率的限制图谱(Rmaps),它为基因组提供了独特的数字表示。与序列读数一起使用,它们为基因组组装以及发现结构变异和重排提供了有用的工具。尽管它们一直是现代基因组组装项目的常规特征,但光学图谱主要用于后处理步骤,而不是基因组组装过程本身。已经提出了几种方法来进行单分子光学图谱的两两比对——称为 Rmaps,或用于将光学图谱与组装的读数进行比对。然而,以前没有研究过将 Rmap 与表示同一基因组序列数据的图形进行比对的问题。这种比对提供了两组数据之间的映射:光学图谱和序列数据,这将有助于在序列组装步骤本身中使用光学图谱。
我们定义了将 Rmap 与 de Bruijn 图对齐的问题,并提出了第一个解决该问题的算法,该算法基于种子和扩展方法。我们证明了我们的方法能够将从大肠杆菌基因组生成的 73%的 Rmap 与从同一基因组生成的短读取构建的 de Bruijn 图对齐。我们验证了这些比对,并表明我们的方法达到了 99.6%的准确率。我们还表明,我们的方法可以扩展到更大的基因组。特别是,我们表明在人类数据的情况下,可以将 76%的 Rmap 与 de Bruijn 图对齐。
用于将光学图谱与 de Bruijn 图对齐的软件 omGraph 是用 C++编写的,并在 GNU 通用公共许可证下在 https://github.com/kingufl/omGraph 上公开提供。
补充数据可在 Bioinformatics 在线获得。