Ryu Byeong-Ryeol, Gim Gyeong-Ju, Shin Ye-Rim, Kang Min-Ji, Kim Min-Jun, Kwon Tae-Hyung, Lim Young-Seok, Park Sang-Hyuck, Lim Jung-Dae
Department of Bio-Health Convergence, Kangwon National University, Chuncheon, 24341, Republic of Korea.
Institute of Cannabis Research, Colorado State University-Pueblo, 2200 Bonforte Blvd, Pueblo, CO, 81001-4901, USA.
Sci Data. 2024 Dec 28;11(1):1442. doi: 10.1038/s41597-024-04288-8.
As molecular research on hemp (Cannabis sativa L.) continues to advance, there is a growing need for the accumulation of more diverse genome data and more accurate genome assemblies. In this study, we report the three-way assembly data of a cannabidiol (CBD)-rich cannabis variety, 'Pink Pepper' cultivar using sequencing technology: PacBio Single Molecule Real-Time (SMRT) technology, Illumina sequencing technology, and Oxford Nanopore Technology (ONT). This assembly anchors scaffolds to the ten chromosomes of hemp, and to avoid confusion with previous cannabis genetic research, the chromosomes have been labeled based on an earlier reference genome. The total assembled genome length is 770 Gbp, with a GC content of 34.09% and a repeat region accounting for 77.13% of the genome. This assembly, which incorporates the unique strengths of the three sequencing technologies, demonstrated the highest complete BUSCO scores (97.8%-99.6%) among the reported cannabis genomes, as evaluated using three different BUSCO databases. With annotations for 30,459 protein-coding genes, this dataset can serve as a valuable resource for advancing genetic research on hemp.
随着对大麻(Cannabis sativa L.)的分子研究不断推进,积累更多样化的基因组数据和更准确的基因组组装的需求日益增长。在本研究中,我们报告了一种富含大麻二酚(CBD)的大麻品种“粉红胡椒” cultivar 使用测序技术的三元组装数据:PacBio 单分子实时(SMRT)技术、Illumina 测序技术和牛津纳米孔技术(ONT)。该组装将支架锚定到大麻的十条染色体上,为避免与先前的大麻遗传研究混淆,这些染色体已根据早期的参考基因组进行了标记。组装后的基因组总长度为 770 Gbp,GC 含量为 34.09%,重复区域占基因组的 77.13%。该组装整合了三种测序技术的独特优势,在使用三个不同的 BUSCO 数据库评估时,在所报道的大麻基因组中显示出最高的完整 BUSCO 分数(97.8%-99.6%)。该数据集对 30459 个蛋白质编码基因进行了注释,可作为推进大麻遗传研究的宝贵资源。