Marcolungo Luca, Vincenzi Leonardo, Ballottari Matteo, Cecchin Michela, Cosentino Emanuela, Mignani Thomas, Limongi Antonina, Ferraris Irene, Orlandi Matteo, Rossato Marzia, Delledonne Massimo
Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy.
Genartis srl, Via IV Novembre 24, 37126 Verona, Italy.
Plants (Basel). 2023 Jan 10;12(2):320. doi: 10.3390/plants12020320.
High-throughput chromosome conformation capture (Hi-C) is widely used for scaffolding in de novo assembly because it produces highly contiguous genomes, but its indirect statistical approach can introduce connection errors. We employed optical mapping (Bionano Genomics) as an orthogonal scaffolding technology to assess the structural solidity of Hi-C reconstructed scaffolds. Optical maps were used to assess the correctness of five de novo genome assemblies based on long-read sequencing for contig generation and Hi-C for scaffolding. Hundreds of inconsistencies were found between the reconstructions generated using the Hi-C and optical mapping approaches. Manual inspection, exploiting raw long-read sequencing data and optical maps, confirmed that several of these conflicts were derived from Hi-C joining errors. Such misjoins were widespread, involved the connection of both small and large contigs, and even overlapped annotated genes. We conclude that the integration of optical mapping data after, not before, Hi-C-based scaffolding, improves the quality of the assembly and limits reconstruction errors by highlighting misjoins that can then be subjected to further investigation.
高通量染色体构象捕获技术(Hi-C)因其能产生高度连续的基因组,在从头组装中广泛用于构建支架,但它的间接统计方法可能会引入连接错误。我们采用光学图谱技术(Bionano Genomics)作为一种正交的支架构建技术,以评估Hi-C重建支架的结构稳定性。光学图谱用于评估基于长读长测序生成重叠群以及基于Hi-C构建支架的五个从头基因组组装的正确性。在使用Hi-C和光学图谱方法生成的重建结果之间发现了数百个不一致之处。利用原始长读长测序数据和光学图谱进行人工检查,证实其中一些冲突源自Hi-C连接错误。这种错误连接很普遍,涉及小重叠群和大重叠群的连接,甚至与注释基因重叠。我们得出结论,在基于Hi-C的支架构建之后而非之前整合光学图谱数据,可提高组装质量,并通过突出显示可进一步研究的错误连接来限制重建错误。