State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China.
University of Chinese Academy of Sciences, Beijing, China.
Nat Commun. 2023 Oct 17;14(1):6556. doi: 10.1038/s41467-023-42336-w.
Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.
组装高质量的基因组对于下游的比较和功能基因组学研究非常重要。然而,大多数基因组组装评估工具仅提供定性报告,无法在特定区域精确定位组装错误。在这里,我们开发了一种新的无参考工具——Clipping information for Revealing Assembly Quality(CRAQ),它可以将原始读数映射回组装序列,根据有效的剪辑对齐信息识别区域和结构组装错误。错误计数被转换为相应的组装评估指标,以反映单核苷酸分辨率的组装质量。值得注意的是,CRAQ 可以区分组装错误与杂合位点或单倍型之间的结构差异。该工具可以清楚地指示低质量区域和潜在的结构错误断点;因此,它可以识别应拆分以进一步构建支架和改进组装的连接错误区域。我们已经在使用不同策略组装的多个基因组上对 CRAQ 进行了基准测试,并展示了通过纠正连接错误来改进构建的假染色体。