Ichikawa Kazuki, Shoura Massa J, Artiles Karen L, Jeong Dae-Eun, Owa Chie, Kobayashi Haruka, Suzuki Yoshihiko, Kanamori Manami, Toyoshima Yu, Iino Yuichi, Rougvie Ann E, Wahba Lamia, Fire Andrew Z, Schwarz Erich M, Morishita Shinichi
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan.
Department of Pathology, Stanford University, Stanford, California 94305, USA.
Genome Res. 2025 Aug 1;35(8):1902-1918. doi: 10.1101/gr.280274.124.
The original 100.3 Mb reference genome for , generated from the wild-type laboratory strain N2, has been crucial for analysis of since 1998 and has been considered complete since 2005. Unexpectedly, this long-standing reference was shown to be incomplete in 2019 by a genome assembly from the N2-derived strain VC2010. Moreover, genetically divergent versions of N2 have arisen over decades of research and hindered reproducibility of genetics and genomics. Here we provide a 106.4 Mb gap-free, telomere-to-telomere genome assembly of , generated from CGC1, an isogenic derivative of the N2 strain. We use improved long-read sequencing and manual assembly of 43 recalcitrant genomic regions to overcome deficiencies of prior N2 and VC2010 assemblies and to assemble tandem repeat loci, including a 772 kb sequence for the 45S rRNA genes. Although many differences from earlier assemblies come from repeat regions, unique additions to the genome are also found. Of 19,972 protein-coding genes in the N2 assembly, 19,790 (99.1%) encode products that are unchanged in the CGC1 assembly. The CGC1 assembly also may encode 183 new protein-coding and 163 new ncRNA genes. CGC1 thus provides both a completely defined reference genome and corresponding isogenic wild-type strain for , allowing unique opportunities for model and systems biology.
最初从野生型实验室菌株N2生成的100.3 Mb参考基因组,自1998年以来对于[物种名称]的分析至关重要,并且自2005年以来一直被认为是完整的。出乎意料的是,2019年由源自N2的菌株VC2010进行的基因组组装表明,这个长期使用的参考基因组并不完整。此外,在数十年的研究过程中出现了基因上有差异的N2版本,这阻碍了[物种名称]遗传学和基因组学研究的可重复性。在这里,我们提供了一个106.4 Mb的无间隙、从端粒到端粒的[物种名称]基因组组装,它由N2菌株的同基因衍生物CGC1生成。我们使用改进的长读长测序和对43个顽固基因组区域的人工组装,以克服先前N2和VC2010组装的缺陷,并组装串联重复位点,包括45S rRNA基因的772 kb序列。虽然与早期组装的许多差异来自重复区域,但也发现了基因组中的独特新增部分。在N2组装的19,972个蛋白质编码基因中,19,790个(99.1%)编码的产物在CGC1组装中没有变化。CGC1组装还可能编码183个新的蛋白质编码基因和163个新的非编码RNA基因。因此,CGC1为[物种名称]提供了一个完全确定的参考基因组和相应的同基因野生型菌株,为模式生物学和系统生物学提供了独特的机会。