Genoscope, Institut de biologie François-Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, Evry, France.
CIRAD, UMR AGAP, Montpellier, France.
Nat Plants. 2018 Nov;4(11):879-887. doi: 10.1038/s41477-018-0289-4. Epub 2018 Nov 2.
Plant genomes are often characterized by a high level of repetitiveness and polyploid nature. Consequently, creating genome assemblies for plant genomes is challenging. The introduction of short-read technologies 10 years ago substantially increased the number of available plant genomes. Generally, these assemblies are incomplete and fragmented, and only a few are at the chromosome scale. Recently, Pacific Biosciences and Oxford Nanopore sequencing technologies were commercialized that can sequence long DNA fragments (kilobases to megabase) and, using efficient algorithms, provide high-quality assemblies in terms of contiguity and completeness of repetitive regions. However, even though genome assemblies based on long reads exhibit high contig N50s (>1 Mb), these methods are still insufficient to decipher genome organization at the chromosome level. Here, we describe a strategy based on long reads (MinION or PromethION sequencers) and optical maps (Saphyr system) that can produce chromosome-level assemblies and demonstrate applicability by generating high-quality genome sequences for two new dicotyledon morphotypes, Brassica rapa Z1 (yellow sarson) and Brassica oleracea HDEM (broccoli), and one new monocotyledon, Musa schizocarpa (banana). All three assemblies show contig N50s of >5 Mb and contain scaffolds that represent entire chromosomes or chromosome arms.
植物基因组通常具有高度重复和多倍体的特点。因此,构建植物基因组的基因组组装是具有挑战性的。10 年前短读长技术的引入极大地增加了可用的植物基因组数量。通常,这些组装是不完整和碎片化的,只有少数达到染色体规模。最近,Pacific Biosciences 和 Oxford Nanopore 测序技术商业化,可以对长 DNA 片段(千碱基到兆碱基)进行测序,并使用高效的算法,在重复区域的连续性和完整性方面提供高质量的组装。然而,尽管基于长读长的基因组组装表现出高的 contig N50 值(>1 Mb),但这些方法仍然不足以解析染色体水平的基因组组织。在这里,我们描述了一种基于长读长(MinION 或 PromethION 测序仪)和光学图谱(Saphyr 系统)的策略,该策略可以产生染色体水平的组装,并通过生成两种新的双子叶植物形态型 Brassica rapa Z1(黄色芥菜)和 Brassica oleracea HDEM(西兰花)以及一种新的单子叶植物 Musa schizocarpa(香蕉)的高质量基因组序列来证明其适用性。这三个组装都显示出>5 Mb 的 contig N50 值,并且包含代表整个染色体或染色体臂的支架。