Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Genome Biol. 2020 Sep 14;21(1):245. doi: 10.1186/s13059-020-02134-9.
Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.
最近的长读长组装通常超过了可用参考基因组的质量和完整性,使得验证具有挑战性。在这里,我们提出了 Merqury,这是一种基于高效 k-mer 集操作的新型无参考组装评估工具。通过将从头组装中的 k-mer 与未组装的高精度读取中的 k-mer 进行比较,Merqury 可以估计碱基水平的准确性和完整性。对于三体,Merqury 还可以评估单倍型特异性的准确性、完整性、相位块连续性和切换错误。可以生成多种可视化效果,如 k-mer 频谱图,用于评估。我们在人类和植物基因组上的演示表明,Merqury 是一种快速而强大的组装验证方法。