Han Biao, Wang Longxin, Xian Yang, Xie Xiao-Man, Li Wen-Qing, Zhao Ye, Zhang Ren-Gang, Qin Xiaochun, Li De-Zhu, Jia Kai-Hua
Key Laboratory of State Forestry and Grassland Administration Conservation and Utilization of Warm Temperate Zone Forest and Grass Germplasm Resources, Shandong Provincial Center of Forest and Grass Germplasm Resources, Jinan, China.
School of Biological Science and Technology, University of Jinan, Jinan, China.
Front Plant Sci. 2022 Sep 23;13:1001583. doi: 10.3389/fpls.2022.1001583. eCollection 2022.
(Fagaceae) is an ecologically and economically important deciduous broadleaved tree species native to and widespread in East Asia. It is a valuable woody species and an indicator of local forest health, and occupies a dominant position in forest ecosystems in East Asia. However, genomic resources from are still lacking. Here, we present a high-quality genome generated by PacBio HiFi and Hi-C sequencing. The assembled genome size is 787 Mb, with a contig N50 of 26.04 Mb and scaffold N50 of 64.86 Mb, comprising 12 pseudo-chromosomes. The repetitive sequences constitute 67.6% of the genome, of which the majority are long terminal repeats, accounting for 46.62% of the genome. We used , RNA sequence-based and homology-based predictions to identify protein-coding genes. A total of 32,466 protein-coding genes were identified, of which 95.11% could be functionally annotated. Evolutionary analysis showed that was more closely related to than to or We found no evidence for species-specific whole genome duplications in after the species had diverged. This study provides the first genome assembly and the first gene annotation data for These resources will inform the design of further breeding strategies, and will be valuable in the study of genome editing and comparative genomics in oak species.
(壳斗科)是一种在生态和经济方面都很重要的落叶阔叶树种,原产于东亚且在东亚广泛分布。它是一种珍贵的木本物种,也是当地森林健康状况的指标,在东亚森林生态系统中占据主导地位。然而,关于该物种的基因组资源仍然匮乏。在此,我们展示了一个通过PacBio HiFi和Hi-C测序生成的高质量基因组。组装后的基因组大小为787 Mb,contig N50为26.04 Mb,scaffold N50为64.86 Mb,由12条假染色体组成。重复序列占基因组的67.6%,其中大部分是长末端重复序列,占基因组的46.62%。我们使用基于RNA序列和同源性的预测方法来鉴定蛋白质编码基因。总共鉴定出32466个蛋白质编码基因,其中95.11%可以进行功能注释。进化分析表明,该物种与[物种名1]的关系比与[物种名2]或[物种名3]的关系更密切。我们没有发现该物种分化后发生物种特异性全基因组复制的证据。本研究提供了该物种的首个基因组组装和首个基因注释数据。这些资源将为进一步的育种策略设计提供信息,并且在橡树物种的基因组编辑和比较基因组学研究中具有重要价值。