Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, 52062 Aachen, Germany.
Commissariat à l'Energie Atomique et aux Energies Alternatives, Genoscope, 91057 Evry, France.
Plant Cell. 2017 Oct;29(10):2336-2348. doi: 10.1105/tpc.17.00521. Epub 2017 Oct 12.
Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.
纳米孔技术的更新使得获得千兆碱基的序列数据成为可能。在此之前,纳米孔测序技术主要用于分析微生物样本。在这里,我们描述了一个综合的纳米孔测序数据集的生成,其平均读长为 11979bp,用于自交亲和的野生番茄种。我们描述了其基因组组装到 2.5MB 的 contig N50。组装流水线包括使用 Canu 进行初始读修正和使用 SMARTdenovo 进行组装。基于原始纳米孔的从头基因组与参考 LA716 序列在结构上高度相似,但错误率高,并且富含同源多聚体缺失。在用 Illumina 读数对组装进行抛光后,我们获得了<0.02%的错误率,与相同的 Illumina 数据相比。我们获得了 96.53%的基因完整性,略高于参考的。总之,我们的数据表明,这种长读测序数据可以用于经济高效地测序和组装千兆碱基大小的植物基因组。