Biological Data Science Institute, The Australian National University, Canberra, Australia.
Research School of Biology, The Australian National University, Canberra, Australia.
Genome Biol. 2022 Mar 25;23(1):84. doi: 10.1186/s13059-022-02658-2.
Most animals and plants have more than one set of chromosomes and package these haplotypes into a single nucleus within each cell. In contrast, many fungal species carry multiple haploid nuclei per cell. Rust fungi are such species with two nuclei (karyons) that contain a full set of haploid chromosomes each. The physical separation of haplotypes in dikaryons means that, unlike in diploids, Hi-C chromatin contacts between haplotypes are false-positive signals.
We generate the first chromosome-scale, fully-phased assembly for the dikaryotic leaf rust fungus Puccinia triticina and compare Nanopore MinION and PacBio HiFi sequence-based assemblies. We show that false-positive Hi-C contacts between haplotypes are predominantly caused by phase switches rather than by collapsed regions or Hi-C read mis-mappings. We introduce a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs, including a phase switch correction step. In the HiFi assembly, relatively few phase switches occur, and these are predominantly located at haplotig boundaries and can be readily corrected. In contrast, phase switches are widespread throughout the Nanopore assembly. We show that haploid genome read coverage of 30-40 times using HiFi sequencing is required for phasing of the leaf rust genome, with 0.7% heterozygosity, and that HiFi sequencing resolves genomic regions with low heterozygosity that are otherwise collapsed in the Nanopore assembly.
This first Hi-C based phasing pipeline for dikaryons and comparison of long-read sequencing technologies will inform future genome assembly and haplotype phasing projects in other non-haploid organisms.
大多数动植物都有不止一套染色体,并将这些单倍型包装在每个细胞内的单个核中。相比之下,许多真菌物种在每个细胞中携带多个单倍体核。锈菌就是这样的物种,每个细胞有两个核(核仁),每个核仁包含一套完整的单倍体染色体。二倍体中单倍型的物理分离意味着,与二倍体不同,单倍型之间的 Hi-C 染色质接触是假阳性信号。
我们为二倍体叶锈菌 Puccinia triticina 生成了第一个染色体尺度的、完全定相的组装,并比较了 Nanopore MinION 和 PacBio HiFi 测序基组装。我们表明,单倍型之间的假阳性 Hi-C 接触主要是由相位转换引起的,而不是由折叠区域或 Hi-C 读取错配引起的。我们引入了一种使用 Hi-C 接触图对二倍体基因组进行定相的方法,包括相位转换校正步骤。在 HiFi 组装中,相对较少的相位转换发生,并且这些相位转换主要位于单倍体型边界处,并且可以容易地校正。相比之下,相位转换在整个 Nanopore 组装中广泛存在。我们表明,使用 HiFi 测序进行 30-40 倍的单倍体基因组读取覆盖度是定相叶锈菌基因组所必需的,杂合度为 0.7%,并且 HiFi 测序解决了在 Nanopore 组装中折叠的低杂合度基因组区域。
这是第一个用于二倍体的基于 Hi-C 的定相管道,以及对长读测序技术的比较,将为未来的基因组组装和其他非单倍体生物的单倍型定相项目提供信息。