School of Information Science and Engineering, Central South University, ChangSha 410083, China.
College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo 454000, China.
Bioinformatics. 2017 Jan 15;33(2):169-176. doi: 10.1093/bioinformatics/btw597. Epub 2016 Sep 14.
While aiming to determine orientations and orders of fragmented contigs, scaffolding is an essential step of assembly pipelines and can make assembly results more complete. Most existing scaffolding tools adopt scaffold graph approaches. However, due to repetitive regions in genome, sequencing errors and uneven sequencing depth, constructing an accurate scaffold graph is still a challenge task.
In this paper, we present a novel algorithm (called BOSS), which employs paired reads for scaffolding. To construct a scaffold graph, BOSS utilizes the distribution of insert size to decide whether an edge between two vertices (contigs) should be added and how an edge should be weighed. Moreover, BOSS adopts an iterative strategy to detect spurious edges whose removal can guarantee no contradictions in the scaffold graph. Based on the scaffold graph constructed, BOSS employs a heuristic algorithm to sort vertices (contigs) and then generates scaffolds. The experimental results demonstrate that BOSS produces more satisfactory scaffolds, compared with other popular scaffolding tools on real sequencing data of four genomes.
BOSS is publicly available for download at https://github.com/bioinfomaticsCSU/BOSS CONTACT: jxwang@mail.csu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
在旨在确定碎片化接头的方向和顺序时,支架是组装管道的重要步骤,它可以使组装结果更加完整。大多数现有的支架工具都采用支架图方法。然而,由于基因组中的重复区域、测序错误和不均匀的测序深度,构建准确的支架图仍然是一项具有挑战性的任务。
在本文中,我们提出了一种新的算法(称为 BOSS),该算法使用配对读取进行支架构建。为了构建支架图,BOSS 利用插入大小的分布来决定两个顶点(接头)之间的边是否应该添加以及边应该如何加权。此外,BOSS 采用迭代策略来检测虚假边,删除这些边可以保证支架图中没有矛盾。基于构建的支架图,BOSS 采用启发式算法对顶点(接头)进行排序,然后生成支架。实验结果表明,与其他流行的支架工具相比,BOSS 在四个基因组的真实测序数据上生成了更令人满意的支架。
BOSS 可在 https://github.com/bioinfomaticsCSU/BOSS 上公开下载。
补充数据可在生物信息学在线获得。