Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, P,O, Box 68 (Gustaf Hällstromin katu 2b), Helsinki, 00014, Finland.
BMC Bioinformatics. 2012 Oct 3;13:255. doi: 10.1186/1471-2105-13-255.
For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved.
We propose a simple but rigorous normalized N50 assembly metric that combines N50 with such a correctness measure; assembly is split into as many parts as necessary to align each part to the reference. For scalability, we first compute maximal local approximate matches between scaffolds and reference in distributed manner, and then proceed with co-linear chaining to find a global alignment. Best alignment is removed from the scaffold and the process is iterated with the remaining scaffold content in order to split the scaffold into correctly aligning parts. The proposed normalized N50 metric is then the N50 value computed for the final correctly aligning parts. As a side result of independent interest, we show how to modify co-linear chaining to restrict gaps to produce a more sensible global alignment.
We propose and implement a comprehensive and efficient approach to compute a metric that summarizes scaffold assembly correctness and length. Our implementation can be downloaded from http://www.cs.helsinki.fi/group/scaffold/normalizedN50/.
为了开发基因组组装工具,需要一些全面且可高效计算的验证措施来评估组装的质量。最常用的 N50 度量标准通过重叠支架(或重叠群)中点的长度来总结组装结果(或重叠群)的长度顺序连接。特别是对于支架组装,将正确性度量标准与 N50 值结合起来并不是一件简单的事情,目前的方法相当复杂。
我们提出了一种简单但严格的归一化 N50 组装度量标准,将 N50 与这种正确性度量标准相结合;将组装分割成尽可能多的部分,以便将每个部分与参考对齐。为了提高可扩展性,我们首先以分布式方式计算支架和参考之间的最大局部近似匹配,然后继续进行共线性链接以找到全局对齐。从支架中删除最佳对齐,然后使用剩余的支架内容进行迭代,以便将支架分割成正确对齐的部分。然后,将 N50 值计算为最终正确对齐部分的 N50 值。作为独立感兴趣的一个次要结果,我们展示了如何修改共线性链接以限制间隙以产生更合理的全局对齐。
我们提出并实现了一种全面且高效的方法来计算总结支架组装正确性和长度的度量标准。我们的实现可以从 http://www.cs.helsinki.fi/group/scaffold/normalizedN50/ 下载。