Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8583, Japan.
Department of Pathology, Stanford University, Stanford, California 94305, USA.
Genome Res. 2019 Jun;29(6):1009-1022. doi: 10.1101/gr.244830.118. Epub 2019 May 23.
was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any available today. To provide a more accurate genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted ≥53 new genes in VC2010. The recompleted genome of should be a valuable resource for genetics, genomics, and systems biology.
是第一个被测序到明显完整的多细胞真核生物基因组。虽然这个组装使用了一个标准品系(N2),但它使用了来自几个实验室的序列数据,这些数据在细菌和酵母中进行了繁殖。因此,N2 组装与今天可用的任何组装都有许多不同。为了提供更准确的基因组,我们对来自 N2 的现代品系 VC2010 进行了长读长组装。我们的 VC2010 组装与 N2 的同一性为 99.98%,但增加了 1.8Mb,包括串联重复扩展和基因组重复。对于 N2 和 VC2010 之间的 116 个结构差异,在两个外群菌株中也发现了 97 个与 VC2010 匹配的结构(84%),这表明 N2 存在缺陷。超过 98%的 N2 基因在 VC2010 中编码未改变的产物;此外,我们预测 VC2010 中至少有 53 个新基因。重新完成的基因组应该是遗传学、基因组学和系统生物学的宝贵资源。