Chen Kun-Tze, Shen Hsin-Ting, Lu Chin Lung
Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan.
BMC Syst Biol. 2018 Dec 31;12(Suppl 9):139. doi: 10.1186/s12918-018-0654-y.
One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient alone for a scaffolder to generate correct scaffolds of a target draft genome, especially when the evolutionary relationship between the target and reference genomes is distant or some rearrangements occur between them. This motivates the need to develop scaffolding tools that can order and orient the contigs of the target genome using multiple reference genomes.
In this work, we utilize a heuristic method to develop a new scaffolder called Multi-CSAR that is able to accurately scaffold a target draft genome based on multiple reference genomes, each of which does not need to be complete. Our experimental results on real datasets show that Multi-CSAR outperforms other two multiple reference-based scaffolding tools, Ragout and MeDuSa, in terms of many average metrics, such as sensitivity, precision, F-score, genome coverage, NGA50, scaffold number and running time.
Multi-CSAR is a multiple reference-based scaffolder that can efficiently produce more accurate scaffolds of a target draft genome by referring to multiple complete and/or incomplete genomes of related organisms. Its stand-alone program is available for download at https://github.com/ablab-nthu/Multi-CSAR.
从短读段组装基因组序列过程中的一个重要步骤是搭建支架,即将草图基因组中的重叠群排序并定向成支架。目前,已经开发了几种基于单个参考基因组的支架搭建工具。然而,对于一个支架搭建工具来说,仅靠单个参考基因组可能不足以生成目标草图基因组的正确支架,特别是当目标基因组与参考基因组之间的进化关系较远或它们之间发生了一些重排时。这促使人们需要开发能够使用多个参考基因组对目标基因组的重叠群进行排序和定向的支架搭建工具。
在这项工作中,我们利用一种启发式方法开发了一种名为Multi-CSAR的新支架搭建工具,它能够基于多个参考基因组准确地为目标草图基因组搭建支架,每个参考基因组不需要完整。我们在真实数据集上的实验结果表明,在许多平均指标方面,如灵敏度、精确率、F值、基因组覆盖率、NGA50、支架数量和运行时间,Multi-CSAR优于其他两种基于多个参考基因组的支架搭建工具Ragout和MeDuSa。
Multi-CSAR是一种基于多个参考基因组的支架搭建工具,通过参考相关生物的多个完整和/或不完整基因组,能够有效地生成目标草图基因组的更准确支架。其独立程序可在https://github.com/ablab-nthu/Multi-CSAR上下载。