Aganezov Sergey S, Alekseyev Max A
Princeton University, 35 Olden St., Princeton, 08450, NJ, USA.
ITMO University, 49 Kronverksky Pr., St. Petersburg, 197101, Russia.
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):496. doi: 10.1186/s12859-017-1919-y.
Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually.
We present CAMSA-a tool for comparative analysis and merging of two or more given scaffold assemblies. The tool (i) creates an extensive report with several comparative quality metrics; (ii) constructs the most confident merged scaffold assembly; and (iii) provides an interactive framework for a visual comparative analysis of the given assemblies. Among the CAMSA features, only scaffold merging can be evaluated in comparison to existing methods. Namely, it resembles the functionality of assembly reconciliation tools, although their primary targets are somewhat different. Our evaluations show that CAMSA produces merged assemblies of comparable or better quality than existing assembly reconciliation tools while being the fastest in terms of the total running time.
CAMSA addresses the current deficiency of tools for automated comparison and analysis of multiple assemblies of the same set scaffolds. Since there exist numerous methods and techniques for scaffold assembly, identifying similarities and dissimilarities across assemblies produced by different methods is beneficial both for the developers of scaffold assembly algorithms and for the researchers focused on improving draft assemblies of specific organisms.
尽管近期在基因组测序和组装方面取得了进展,但目前许多可用的组装基因组仍处于草图形式。此类草图基因组由大量基因组片段(支架)组成,这些片段在基因组中的位置和方向尚不清楚。虽然存在多种利用各种计算和湿实验室技术从支架重建基因组的方法,但它们往往只能产生部分容易出错的支架组装。因此,比较和合并不同方法产生的支架组装变得很重要,这样可以结合它们的优点并突出当前的冲突以便进一步研究。如果手动执行这些任务可能会很费力。
我们展示了CAMSA——一种用于对两个或更多给定支架组装进行比较分析和合并的工具。该工具(i)创建一份包含多个比较质量指标的详细报告;(ii)构建最可靠的合并支架组装;(iii)提供一个交互式框架用于对给定组装进行可视化比较分析。在CAMSA的功能中,只有支架合并可以与现有方法进行比较评估。也就是说,它类似于组装协调工具的功能,尽管它们的主要目标略有不同。我们的评估表明,CAMSA产生的合并组装质量与现有组装协调工具相当或更好,同时在总运行时间方面是最快的。
CAMSA解决了当前用于自动比较和分析同一组支架的多个组装的工具的不足。由于存在众多用于支架组装的方法和技术,识别不同方法产生的组装之间的异同对于支架组装算法的开发者和专注于改进特定生物体草图组装的研究人员都有益处。