Hebei Key Laboratory of Crop Genetics and Breeding, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, National Soybean Improvement Center Shijiazhuang Sub- Center, Hebei Academy of Agricultural and Forestry Sciences, 050035, Shijiazhuang, Hebei, China.
College of Life Sciences, Hebei Agricultural University, 071001, Baoding, Hebei, China.
BMC Genom Data. 2024 Mar 4;25(1):25. doi: 10.1186/s12863-024-01213-1.
Soybean is an important feed and oil crop in the world due to its high protein and oil content. China has a collection of more than 43,000 soybean germplasm resources, which provides a rich genetic diversity for soybean breeding. However, the rich genetic diversity poses great challenges to the genetic improvement of soybean. This study reports on the de novo genome assembly of HJ117, a soybean variety with high protein content of 52.99%. These data will prove to be valuable resources for further soybean quality improvement research, and will aid in the elucidation of regulatory mechanisms underlying soybean protein content.
We generated a contiguous reference genome of 1041.94 Mb for HJ117 using a combination of Illumina short reads (23.38 Gb) and PacBio long reads (25.58 Gb), with high-quality sequence coverage of approximately 22.44× and 24.55×, respectively. HJ117 was developed through backcross breeding, using Jidou 12 as the recurrent parent and Chamoshidou as the donor parent. The assembly was further assisted by 114.5 Gb Hi-C data (109.9×), resulting in a contig N50 of 19.32 Mb and scaffold N50 of 51.43 Mb. Notably, Core Eukaryotic Genes Mapping Approach (CEGMA) assessment and Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment results indicated that most core eukaryotic genes (97.18%) and genes in the BUSCO dataset (99.4%) were identified, and 96.44% of the genomic sequences were anchored onto twenty pseudochromosomes.
由于大豆高蛋白和高油的特性,它是一种重要的饲料和油料作物。中国拥有超过 43000 份大豆种质资源,为大豆的遗传改良提供了丰富的遗传多样性。然而,这种丰富的遗传多样性给大豆的遗传改良带来了巨大的挑战。本研究报道了高蛋白含量为 52.99%的大豆品种 HJ117 的从头基因组组装。这些数据将为进一步的大豆品质改良研究提供有价值的资源,并有助于阐明大豆蛋白含量的调控机制。
我们使用 Illumina 短读序列(23.38Gb)和 PacBio 长读序列(25.58Gb)组合,为 HJ117 生成了一个 1041.94Mb 的连续参考基因组,高质量序列覆盖度分别约为 22.44×和 24.55×。HJ117 是通过回交育种方法培育的,以 Jidou 12 为轮回亲本,以 Chamoshidou 为供体亲本。组装还进一步借助了 114.5Gb 的 Hi-C 数据(109.9×),得到了一个 N50 为 19.32Mb 的 contig 和一个 N50 为 51.43Mb 的 scaffold。值得注意的是,核心真核生物基因作图方法(CEGMA)评估和通用单拷贝同源基因(BUSCO)评估结果表明,大多数核心真核生物基因(97.18%)和 BUSCO 数据集的基因(99.4%)都被鉴定出来,基因组序列的 96.44%被锚定在二十个假染色体上。