Delft Bioinformatics Lab, Delft University of Technology, 2628 CD Delft, The Netherlands.
Broad Institute of MIT and Harvard, Boston, MA 02142, USA.
FEMS Yeast Res. 2017 Nov 1;17(7). doi: 10.1093/femsyr/fox074.
The haploid Saccharomyces cerevisiae strain CEN.PK113-7D is a popular model system for metabolic engineering and systems biology research. Current genome assemblies are based on short-read sequencing data scaffolded based on homology to strain S288C. However, these assemblies contain large sequence gaps, particularly in subtelomeric regions, and the assumption of perfect homology to S288C for scaffolding introduces bias. In this study, we obtained a near-complete genome assembly of CEN.PK113-7D using only Oxford Nanopore Technology's MinION sequencing platform. Fifteen of the 16 chromosomes, the mitochondrial genome and the 2-μm plasmid are assembled in single contigs and all but one chromosome starts or ends in a telomere repeat. This improved genome assembly contains 770 Kbp of added sequence containing 248 gene annotations in comparison to the previous assembly of CEN.PK113-7D. Many of these genes encode functions determining fitness in specific growth conditions and are therefore highly relevant for various industrial applications. Furthermore, we discovered a translocation between chromosomes III and VIII that caused misidentification of a MAL locus in the previous CEN.PK113-7D assembly. This study demonstrates the power of long-read sequencing by providing a high-quality reference assembly and annotation of CEN.PK113-7D and places a caveat on assumed genome stability of microorganisms.
单倍体酿酒酵母 CEN.PK113-7D 菌株是代谢工程和系统生物学研究的常用模型系统。目前的基因组组装基于 S288C 菌株同源性的短读测序数据支架。然而,这些组装包含大量的序列缺口,特别是在端粒区,并且支架的 S288C 完美同源性假设会引入偏差。在这项研究中,我们仅使用 Oxford Nanopore Technology 的 MinION 测序平台获得了 CEN.PK113-7D 的近乎完整的基因组组装。16 条染色体中的 15 条、线粒体基因组和 2μm 质粒都被组装成单个连续序列,除了一条染色体外,所有染色体都在端粒重复处开始或结束。与之前的 CEN.PK113-7D 组装相比,这个改进的基因组组装包含 770 Kbp 的附加序列,其中包含 248 个基因注释。这些基因中的许多编码在特定生长条件下决定适应性的功能,因此与各种工业应用密切相关。此外,我们发现了染色体 III 和 VIII 之间的易位,这导致了之前 CEN.PK113-7D 组装中 MAL 基因座的错误鉴定。这项研究通过提供高质量的 CEN.PK113-7D 参考组装和注释,展示了长读测序的强大功能,并对微生物的假定基因组稳定性提出了警告。