Liu Shoucheng, Li Kui, Dai Xiuru, Qin Guochen, Lu Dongdong, Gao Zhaoxu, Li Xiaopeng, Song Bolong, Bian Jianxin, Ren Da, Liu Yongqi, Chen Xiaofeng, Xu Yunbi, Liu Weimin, Yang Chen, Liu Xiaoqin, Chen Shisheng, Li Jian, Li Bosheng, He Hang, Deng Xing Wang
State Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, China.
Peking-Tsinghua Center for Life Sciences, School of Life Sciences and School of Advanced Agricultural Sciences, Peking University, Beijing, China.
Nat Genet. 2025 Apr;57(4):1008-1020. doi: 10.1038/s41588-025-02137-x. Epub 2025 Apr 7.
The complete assembly of vast and complex plant genomes, like the hexaploid wheat genome, remains challenging. Here we present CS-IAAS, a comprehensive telomere-to-telomere (T2T) gap-free Triticum aestivum L. genome, encompassing 14.51 billion base pairs and featuring all 21 centromeres and 42 telomeres. Annotation revealed 90.8 Mb additional centromeric satellite arrays and 5,611 rDNA units. Genome-wide rearrangements, centromeric elements, transposable element expansion and segmental duplications were deciphered during tetraploidization and hexaploidization, providing a comprehensive understanding of wheat subgenome evolution. Among them, transposable element insertions during hexaploidization greatly influenced gene expression balances, thus increasing the genome plasticity of transcriptional levels. Additionally, we generated 163,329 full-length cDNA sequences and proteomic data that helped annotate 141,035 high-confidence protein-coding genes. The complete T2T reference genome (CS-IAAS), along with its transcriptome and proteome, represents a significant step in our understanding of wheat genome complexity and provides insights for future wheat research and breeding.
对庞大而复杂的植物基因组进行完整组装,如六倍体小麦基因组,仍然具有挑战性。在此,我们展示了CS-IAAS,这是一个完整的从端粒到端粒(T2T)无间隙的普通小麦基因组,包含145.1亿个碱基对,具有全部21个着丝粒和42个端粒。注释揭示了90.8Mb额外的着丝粒卫星阵列和5611个核糖体DNA单元。在四倍体化和六倍体化过程中解析了全基因组重排、着丝粒元件、转座元件扩张和片段重复,从而全面了解小麦亚基因组的进化。其中,六倍体化过程中转座元件的插入极大地影响了基因表达平衡,从而增加了转录水平的基因组可塑性。此外,我们生成了163329条全长cDNA序列和蛋白质组数据,这些数据有助于注释141035个高可信度蛋白质编码基因。完整的T2T参考基因组(CS-IAAS)及其转录组和蛋白质组,代表了我们在理解小麦基因组复杂性方面的重要一步,并为未来的小麦研究和育种提供了见解。