Li Xuexian, Li Hua, Lu Zhuangyue, Du Guanben, Wang Xiaoli
The Key Laboratory of Forest Resources Conservation and Utilization in the Southwest Mountains of China Ministry of Education, Southwest Forestry University, 650224, Kunming, China.
College of Materials and Chemical Engineering, Southwest Forestry University, 650224, Kunming, China.
Sci Data. 2025 Jul 1;12(1):1093. doi: 10.1038/s41597-025-05421-x.
Eucalyptus globulus Labill. is the primary plant material used for the extraction of eucalyptus oil. For this study, we generated a chromosome-level genome assembly of E. globulus using a combination of HiFi long-reads, Illumina short-reads and Hi-C data. The assembled genome size was 556.98 Mb, with a contig N50 of 37.93 Mb and scaffold N50 of 54.03 Mb. The completeness of the genome assembly was evaluated at 98.30% by BUSCO, and 99.05% of the genome sequence was anchored to 11 chromosomes. Additionally, 52.55% of repetitive sequences were identified in the genome, and 36,387 protein-coding genes were predicted, of which 96.80% were functionally annotated. Notably, a comparison of the genome of E. globulus with six other congeneric species revealed strong conservation, in terms of gene numbers and structures. Overall, we assembled the most complete E. globulus genome sequence to date. This will serve as a valuable resource toward the elucidation of the molecular kinetics behind the accumulation of eucalyptus oil in E. globulus leaves to facilitate further genetic improvements.
蓝桉(Eucalyptus globulus Labill.)是用于提取桉叶油的主要植物材料。在本研究中,我们结合高保真长读长、Illumina短读长和Hi-C数据,生成了蓝桉的染色体水平基因组组装。组装后的基因组大小为556.98 Mb,重叠群N50为37.93 Mb,支架N50为54.03 Mb。通过BUSCO评估,基因组组装的完整性为98.30%,99.05%的基因组序列被锚定到11条染色体上。此外,在基因组中鉴定出52.55%的重复序列,并预测了36387个蛋白质编码基因,其中96.80%具有功能注释。值得注意的是,将蓝桉的基因组与其他六个同属物种进行比较,发现在基因数量和结构方面具有很强的保守性。总体而言,我们组装了迄今为止最完整的蓝桉基因组序列。这将成为阐明蓝桉叶中桉叶油积累背后分子动力学的宝贵资源,以促进进一步的遗传改良。