Department of Computer Science and Engineering, UC Riverside, CA, USA.
Department of Botany and Plant Sciences, UC Riverside, CA, USA.
Bioinformatics. 2018 Jul 1;34(13):i43-i51. doi: 10.1093/bioinformatics/bty255.
De novo genome assembly is a challenging computational problem due to the high repetitive content of eukaryotic genomes and the imperfections of sequencing technologies (i.e. sequencing errors, uneven sequencing coverage and chimeric reads). Several assembly tools are currently available, each of which has strengths and weaknesses in dealing with the trade-off between maximizing contiguity and minimizing assembly errors (e.g. mis-joins). To obtain the best possible assembly, it is common practice to generate multiple assemblies from several assemblers and/or parameter settings and try to identify the highest quality assembly. Unfortunately, often there is no assembly that both maximizes contiguity and minimizes assembly errors, so one has to compromise one for the other.
The concept of assembly reconciliation has been proposed as a way to obtain a higher quality assembly by merging or reconciling all the available assemblies. While several reconciliation methods have been introduced in the literature, we have shown in one of our recent papers that none of them can consistently produce assemblies that are better than the assemblies provided in input. Here we introduce Novo&Stitch, a novel method that takes advantage of optical maps to accurately carry out assembly reconciliation (assuming that the assembled contigs are sufficiently long to be reliably aligned to the optical maps, e.g. 50 Kbp or longer). Experimental results demonstrate that Novo&Stitch can double the contiguity (N50) of the input assemblies without introducing mis-joins or reducing genome completeness.
Novo&Stitch can be obtained from https://github.com/ucrbioinfo/Novo_Stitch.
由于真核生物基因组的高重复含量和测序技术的不完善性(即测序错误、不均匀的测序覆盖度和嵌合读取),从头基因组组装是一个具有挑战性的计算问题。目前有几种组装工具可用,每种工具在最大化连续性和最小化组装错误(例如,误连接)之间的权衡方面都有其优缺点。为了获得最佳的组装结果,通常的做法是从几个组装器和/或参数设置中生成多个组装,并尝试确定质量最高的组装。不幸的是,通常没有既最大化连续性又最小化组装错误的组装,因此必须在两者之间做出妥协。
组装协调的概念已被提出,作为通过合并或协调所有可用组装来获得更高质量组装的一种方法。虽然文献中已经介绍了几种协调方法,但我们在最近的一篇论文中表明,它们都不能始终如一地产生优于输入组装的组装。在这里,我们介绍了 Novo&Stitch,这是一种利用光学图谱准确进行组装协调的新方法(假设组装的连续体足够长,可以可靠地与光学图谱对齐,例如 50 Kbp 或更长)。实验结果表明,Novo&Stitch 可以在不引入误连接或降低基因组完整性的情况下将输入组装的连续性(N50)提高一倍。
可以从 https://github.com/ucrbioinfo/Novo_Stitch 获得 Novo&Stitch。