染色体水平组装的小麦基因组揭示了数千个额外的基因拷贝。

Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies.

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218.

出版信息

Genetics. 2020 Oct;216(2):599-608. doi: 10.1534/genetics.120.303501. Epub 2020 Aug 12.

DOI:10.1534/genetics.120.303501

PMID:32796007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7536849/

Abstract

Bread wheat ( is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the photoperiod response locus.

摘要

面包小麦是主要的粮食作物，也是农业遗传学研究的重要植物系统。然而，由于其异源六倍体基因组的复杂性和庞大性，与其他主要作物相比，基因组资源有限。IWGSC 最近发布了一个参考基因组和相关注释（IWGSC CS v1.0，Chinese Spring），该基因组被小麦研究社区广泛采用和利用。尽管这个参考组装代表了小麦的三个亚基因组在染色体水平上，但它是由短读长序列构建的，因此缺失了预期的 16 Gbp 基因组序列的很大一部分。我们之前发表了一个独立的小麦组装（Triticum_aestivum_3.1，Chinese Spring），它在长度上更接近预期的基因组大小，尽管它只是一个缺乏基因注释的 contig 级别的组装。在这里，我们描述了一个参考指导的努力，将这些 contig 组装成染色体长度的假染色体，添加到 IWGSC CS v1.0 组装中特有的任何缺失序列，并使用基因对生成的假染色体进行注释。我们的更新组装，Triticum_aestivum_4.0，包含 15.07 Gbp 的非间隙序列锚定在染色体上，比以前的参考组装多 1.2 Gbps。它包含 108639 个基因，这些基因明确定位在染色体上，包括以前未定位的 2000 多个基因。我们还发现了超过 5700 个额外的基因拷贝，这有助于准确注释功能基因复制，包括光周期反应基因座。

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验