Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland.
Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland.
Gigascience. 2022 Mar 24;11. doi: 10.1093/gigascience/giac028.
Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and subtropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult-to-assemble genome.
Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present 2 chromosome-scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy >QV46, contig N50 >18 Mb, BUSCO completeness of 99%, and 35k phased gene loci, it is the most accurate, continuous, complete, and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development, and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20-30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue specific and inconsistent across different tissues. Direction-shifting was observed in <2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome rearrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.
The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness, and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy, and continuity.
木薯(Manihot esculenta)是全球热带和亚热带地区重要的无性繁殖粮食作物。分子育种的遗传增益有限,部分原因是木薯是一种高度杂合的作物,基因组具有重复性且难以组装。
本研究表明,与传统长测序reads相比,太平洋生物科学公司的高保真(HiFi)测序reads与组装程序 hifiasm 相结合,可在接近完全单倍型分辨率下生成基因组组装,具有更高的连续性和准确性。我们展示了 2 个基于 Hi-C 技术的染色体尺度单倍体基因组,它们是二倍体非洲木薯品种 TME204 的相位。共识准确性>QV46,contig N50>18 Mb,BUSCO 完整性为 99%,35k 相位基因座,这是迄今为止最准确、连续、完整和单倍型分辨率最高的木薯基因组组装。使用 RNA-seq 数据和 Iso-Seq 转录本进行的从头基因预测鉴定了丰富的新基因座,其功能与染色质组织、分生组织发育和细胞反应有关。在组织发育过程中,不同单倍型来源的差异表达转录本富集了不同的功能。在每种组织中,20-30%的转录本表现出等位基因特异性表达(ASE)差异。ASE 偏倚通常是组织特异性的,并且在不同组织中不一致。在<2%的 ASE 转录本中观察到方向转换。尽管基因高度同线性,但 HiFi 基因组组装揭示了广泛的染色体重排和丰富的基因组内和基因组间的发散序列,其中大部分结构变异与 LTR 反转录转座子有关。我们使用参考质量的组装构建了一个木薯泛基因组,并证明其在代表木薯遗传多样性方面的重要性,用于下游参考指导的组学分析和育种。
相位和注释的染色体对允许以更高的准确性、完整性和单倍型分辨率系统地观察木薯杂合二倍体基因组的组织。它们将成为木薯育种和研究的宝贵资源。我们的研究还可能为开发具有高分辨率、准确性和连续性的复杂基因组的经济高效策略提供见解。