Wang Zijian, Miao Lingfeng, Tan Kaiwen, Guo Weilong, Xin Beibei, Appels Rudi, Jia Jizeng, Lai Jinsheng, Lu Fei, Ni Zhongfu, Fu Xiangdong, Sun Qixin, Chen Jian
State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, China Agricultural University, Beijing 100193, China.
Frontiers Science Center for Molecular Design Breeding (Ministry of Education), China Agricultural University, Beijing 100193, China; State Key Laboratory for Agrobiotechnology, Key Laboratory of Crop Heterosis Utilization (Ministry of Education), Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, China.
Mol Plant. 2025 May 5;18(5):892-907. doi: 10.1016/j.molp.2025.02.002. Epub 2025 Feb 13.
A complete reference genome assembly is crucial for biological research and genetic improvement. Owing to its large size and highly repetitive nature, there are numerous gaps in the globally used wheat Chinese Spring (CS) genome assembly. In this study, we generated a 14.46 Gb near-complete assembly of the CS genome, with a contig N50 of over 266 Mb and an overall base accuracy of 99.9963%. Among the 290 gaps that remained (26, 257, and 7 gaps from the A, B, and D subgenomes, respectively), 278 were extremely high-copy tandem repeats, whereas the remaining 12 were transposable-element-associated gaps. Four chromosome assemblies were completely gap-free, including chr1D, chr3D, chr4D, and chr5D. Extensive annotation of the near-complete genome revealed 151 405 high-confidence genes, of which 59 180 were newly annotated, including 7602 newly assembled genes. Except for the centromere of chr1B, which has a gap associated with superlong GAA repeat arrays, the centromeric sequences of all of the remaining 20 chromosomes were completely assembled. Our near-complete assembly revealed that the extent of tandem repeats, such as simple-sequence repeats, was highly uneven among different subgenomes. Similarly, the repeat compositions of the centromeres also varied among the three subgenomes. With the genome sequences of all six types of seed storage proteins (SSPs) fully assembled, the expression of ω-gliadin was found to be contributed entirely by the B subgenome, whereas the expression of the other five types of SSPs was most abundant from the D subgenome. The near-complete CS genome will serve as a valuable resource for genomic and functional genomic research and breeding of wheat as well as its related species.
完整的参考基因组组装对于生物学研究和遗传改良至关重要。由于其基因组庞大且高度重复,全球广泛使用的小麦中国春(CS)基因组组装存在大量缺口。在本研究中,我们生成了一个14.46 Gb的CS基因组近完整组装,其重叠群N50超过266 Mb,总体碱基准确性为99.9963%。在剩余的290个缺口中(分别来自A、B和D亚基因组的26、257和7个缺口),278个是极高拷贝的串联重复序列,而其余12个是与转座元件相关的缺口。四个染色体组装完全无缺口,包括chr1D、chr3D、chr4D和chr5D。对近完整基因组的广泛注释揭示了151405个高置信度基因,其中59180个是新注释的,包括7602个新组装的基因。除了chr1B的着丝粒存在与超长GAA重复阵列相关的缺口外,其余20条染色体的着丝粒序列均已完全组装。我们的近完整组装表明,不同亚基因组之间串联重复序列(如简单序列重复)的程度极不均衡。同样,着丝粒的重复组成在三个亚基因组之间也有所不同。随着所有六种类型种子贮藏蛋白(SSP)的基因组序列完全组装完成,发现ω-醇溶蛋白的表达完全由B亚基因组贡献,而其他五种类型SSP的表达在D亚基因组中最为丰富。近完整的CS基因组将为小麦及其相关物种的基因组学和功能基因组学研究以及育种提供宝贵资源。