Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218.
Genetics. 2020 Oct;216(2):599-608. doi: 10.1534/genetics.120.303501. Epub 2020 Aug 12.
Bread wheat ( is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the photoperiod response locus.
面包小麦是主要的粮食作物,也是农业遗传学研究的重要植物系统。然而,由于其异源六倍体基因组的复杂性和庞大性,与其他主要作物相比,基因组资源有限。IWGSC 最近发布了一个参考基因组和相关注释(IWGSC CS v1.0,Chinese Spring),该基因组被小麦研究社区广泛采用和利用。尽管这个参考组装代表了小麦的三个亚基因组在染色体水平上,但它是由短读长序列构建的,因此缺失了预期的 16 Gbp 基因组序列的很大一部分。我们之前发表了一个独立的小麦组装(Triticum_aestivum_3.1,Chinese Spring),它在长度上更接近预期的基因组大小,尽管它只是一个缺乏基因注释的 contig 级别的组装。在这里,我们描述了一个参考指导的努力,将这些 contig 组装成染色体长度的假染色体,添加到 IWGSC CS v1.0 组装中特有的任何缺失序列,并使用基因对生成的假染色体进行注释。我们的更新组装,Triticum_aestivum_4.0,包含 15.07 Gbp 的非间隙序列锚定在染色体上,比以前的参考组装多 1.2 Gbps。它包含 108639 个基因,这些基因明确定位在染色体上,包括以前未定位的 2000 多个基因。我们还发现了超过 5700 个额外的基因拷贝,这有助于准确注释功能基因复制,包括光周期反应基因座。