Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, 277-8561, Japan.
Trans-Scale Biology Center, National Institute for Basic Biology, Okazaki, 444-8585, Japan.
BMC Genomics. 2024 Nov 5;25(1):1039. doi: 10.1186/s12864-024-10929-4.
The Japanese cedar (Cryptomeria japonica D. Don) is one of the most important Japanese forest trees, occupying approximately 44% of artificial forests and planted in East Asia, the Azores Archipelago, and certain islands in the Indian Ocean. Although the huge genome of the species (ca. 9 Gbp) with abundant repeat elements may have represented an obstacle for genetic analysis, this species is easily propagated by cutting, flowered by gibberellic acid, transformed by Agrobacterium, and edited by CRISPR/Cas9. These characteristics of C. japonica recommend it as a model conifer species for which reference genome sequences are necessary.
Herein, we report the first chromosome-level assembly of C. japonica (2n = 22) using third-generation selfed progeny (estimated homozygosity rate = 0.96). Young leaf tissue was used to extract high molecular weight DNA (> 50 kb) for HiFi PacBio long-read sequencing and to construct an Hi-C/Omni-C library for Illumina short-read sequencing. The 29× and 26× genome coverage of HiFi and Illumina reads, respectively, for de novo assembly yielded 2,651 contigs (9.1 Gbp, N50 contig size 12.0 Mbp). Hi-C analysis mapped 97% of the nucleotides on 11 chromosomes. The assembly was verified through comparison with a consensus linkage map comprising 7,781 markers. BUSCO analysis identified ∼ 91% conserved genes.
Annotations of genes and comparisons of repeat elements with other Cupressaceae and Pinaceae species provide a fundamental resource for conifer research.
日本扁柏(Cryptomeria japonica D. Don)是最重要的日本造林树种之一,约占人工林的 44%,并被种植在东亚、亚速尔群岛和印度洋的某些岛屿上。尽管该物种的基因组巨大(约 9 Gbp),且富含重复序列,这可能对遗传分析构成了障碍,但该物种可通过扦插繁殖、赤霉素诱导开花、农杆菌转化和 CRISPR/Cas9 编辑等方式进行遗传操作。这些特点使日本扁柏成为一种理想的模式针叶树,需要构建其参考基因组序列。
本文报道了利用第三代自交后代(估计纯合率为 0.96)首次完成日本扁柏(2n=22)的染色体水平组装。采用幼叶组织提取高分子量 DNA(>50 kb),用于 HiFi PacBio 长读测序和构建 Illumina 短读测序的 Hi-C/Omni-C 文库。HiFi 和 Illumina 读段的 29 倍和 26 倍基因组覆盖度分别进行从头组装,得到 2651 个 contigs(9.1 Gbp,N50 contig 大小 12.0 Mbp)。Hi-C 分析将 97%的核苷酸定位在 11 条染色体上。该组装通过与包含 7781 个标记的共识连锁图谱进行比较得到验证。BUSCO 分析鉴定出约 91%的保守基因。
基因注释和与其他柏科和松科物种重复元件的比较为针叶树研究提供了基础资源。