Brankovics Balázs, Zhang Hao, van Diepeningen Anne D, van der Lee Theo A J, Waalwijk Cees, de Hoog G Sybren
CBS-KNAW Fungal Biodiversity Centre, Utrecht, the Netherlands.
Institute of Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, the Netherlands.
PLoS Comput Biol. 2016 Jun 16;12(6):e1004753. doi: 10.1371/journal.pcbi.1004753. eCollection 2016 Jun.
GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).
GRAbB(通过诱饵进行基因组区域组装)是一个致力于从二代测序(NGS)数据中组装特定基因组区域的新程序。当处理多拷贝区域时,这种方法特别有用,比如线粒体基因组和核糖体DNA(rDNA)重复区域,这些基因组部分虽然从系统发育或流行病学角度包含有趣的信息,但常常被忽视或组装效果不佳,不过单拷贝区域也可以进行组装。该程序能够在一次运行中针对多个区域。此外,GRAbB可用于基于同源性从NGS数据中提取特定基因座,例如用于条形码识别的序列。为了使组装具有特异性,必须指定该区域的一个已知部分,例如PCR扩增子的序列或来自相关物种的同源序列。通过仅组装感兴趣的区域,组装过程在计算上的要求要低得多,并且可能会得到质量更高的组装结果。在本研究中,展示了该程序的不同应用和功能,例如:彻底组装(rDNA区域和线粒体基因组)、提取同源区域或基因(间隔区、RNA聚合酶II亚基B1、RNA聚合酶II亚基B2和延伸因子1α),以及在一次运行中提取多个区域。该程序还与MITObim进行了比较,MITObim旨在基于相似的查询序列对单个目标进行彻底组装。结果表明,在速度、内存和磁盘使用方面,GRAbB比MITObim更高效。新程序的其他功能(同时处理多个目标和提取同源区域)是其他程序所不具备的。该程序可在https://github.com/b-brankovics/grabb上获取带有解释性文档的版本。GRAbB已在Ubuntu(12.04和14.04)、Fedora(23)、CentOS(7.1.1503)和Mac OS X(10.7)上进行了测试。此外,GRAbB作为一个Docker仓库可用:brankovics/grabb(https://hub.docker.com/r/brankovics/grabb/)。