Department of Software Engineering, University of Isfahan, Iran.
National Institute for Genetic, Engineering & Biotechnology, (NIGEB), Tehran, Iran.
J Bioinform Comput Biol. 2021 Apr;19(2):2150005. doi: 10.1142/S0219720021500050. Epub 2021 Apr 16.
The de Bruijn Graph algorithm (DBG) as one of the cornerstones algorithms in short read assembly has extended with the rapid advancement of the Next Generation Sequencing (NGS) technologies and low-cost production of millions of high-quality short reads. Erroneous reads, non-uniform coverage, and genomic repeats are three major problems that influence the performance of short read assemblers. To encounter these problems, the iterative DBG algorithm applies multiple [Formula: see text]-mers instead of a single [Formula: see text]-mer, by iterating the DBG graph over a range of [Formula: see text]-mer sizes from the minimum to the maximum. However, the iteration paradigm of iterative DBG deals with complex graphs from the beginning of the algorithm and therefore, causes more potential errors and computational time for resolving various unreal branches. In this research, we propose the Reverse Modified Iterative DBG graph (named RMI-DBG) for short read assembly. RMI-DBG utilizes the DBG algorithm and String graph to achieve the advantages of both algorithms. We present that RMI-DBG performs faster with comparable results in comparison to iterative DBG. Additionally, the quality of the proposed algorithm in terms of continuity and accuracy is evaluated with some commonly-used assemblers via several real datasets of the GAGE-B benchmark.
De Bruijn 图算法(DBG)作为短读测序组装的基石算法之一,随着下一代测序(NGS)技术的快速发展和数百万高质量短读的低成本生产而得到扩展。错误的读段、不均匀的覆盖和基因组重复是影响短读段组装器性能的三个主要问题。为了解决这些问题,迭代 DBG 算法应用多个 [Formula: see text]-mers 而不是单个 [Formula: see text]-mer,通过在从最小到最大的 [Formula: see text]-mer 大小范围内迭代 DBG 图来实现。然而,迭代 DBG 的迭代范例从算法开始就处理复杂的图,因此会导致更多的潜在错误和解决各种非真实分支的计算时间。在这项研究中,我们提出了用于短读段组装的反向修改迭代 DBG 图(称为 RMI-DBG)。RMI-DBG 利用 DBG 算法和字符串图来实现这两种算法的优势。我们提出的算法在速度上比迭代 DBG 更快,并且在连续性和准确性方面具有可比的结果。此外,通过 GAGE-B 基准的几个真实数据集,使用一些常用的组装器评估了该算法的质量。