Microbial Technology Institute and State Key Laboratory of Microbial Technology, Shandong University, Qingdao, China.
State Key Laboratory of Biological Fermentation Engineering of Beer, Tsingtao Brewery Co., Ltd, Qingdao, China.
Microbiol Spectr. 2024 Apr 2;12(4):e0358223. doi: 10.1128/spectrum.03582-23. Epub 2024 Mar 15.
(baker's yeast, budding yeast) is one of the most important model organisms for biological research and is a crucial microorganism in industry. Currently, a huge number of genome sequences are available at the public domain. However, these genomes are distributed at different websites and a large number of them are released without annotation information. To provide one complete annotated genome data resource, we collected 2,507 genome assemblies and re-annotated 2,506 assemblies using a custom annotation pipeline, producing a total of 15,407,164 protein-coding gene models. With a custom pipeline, all these gene sequences were clustered into families. A total of 1,506 single-copy genes were selected as marker genes, which were then used to evaluate the genome completeness and base qualities of all assemblies. Pangenomic analyses were performed based on a selected subset of 847 medium-high-quality genomes. Statistical comparisons revealed a number of gene families showing copy number variations among different organism sources. To the authors' knowledge, this study represents the largest genome annotation project of so far, providing rich genomic resources for the future studies of the model organism and its relatives.IMPORTANCE (baker's yeast, budding yeast) is one of the most important model organisms for biological research and is a crucial microorganism in industry. Though a huge number of genome sequences are available at the public domain, these genomes are distributed at different websites and most are released without annotation, hindering the efficient reuse of these genome resources. Here, we collected 2,507 genomes for , performed genome annotation, and evaluated the genome qualities. All the obtained data have been deposited at public repositories and are freely accessible to the community. This study represents the largest genome annotation project of so far, providing one complete annotated genome data set for , an important workhorse for fundamental biology, biotechnology, and industry.
(啤酒酵母,出芽酵母)是生物研究中最重要的模式生物之一,也是工业中关键的微生物。目前,大量的基因组序列可在公共领域获得。然而,这些基因组分布在不同的网站上,其中许多是在没有注释信息的情况下发布的。为了提供一个完整注释的基因组数据集,我们收集了 2507 个基因组组装,并使用自定义注释管道重新注释了 2506 个组装,总共产生了 15407164 个蛋白质编码基因模型。使用自定义管道,所有这些基因序列都被聚类成家族。总共选择了 1506 个单拷贝基因作为标记基因,然后用于评估所有组装的基因组完整性和碱基质量。基于 847 个中高质量基因组的一个选择子集进行了泛基因组分析。统计比较揭示了一些基因家族在不同生物来源之间存在拷贝数变异。据作者所知,这项研究代表了迄今为止规模最大的 基因组注释项目,为该模式生物及其亲属的未来研究提供了丰富的基因组资源。
(啤酒酵母,出芽酵母)是生物研究中最重要的模式生物之一,也是工业中关键的微生物。虽然大量的基因组序列可在公共领域获得,但这些基因组分布在不同的网站上,大多数是在没有注释的情况下发布的,这阻碍了这些基因组资源的有效再利用。在这里,我们收集了 2507 个 基因组,进行了基因组注释,并评估了基因组质量。所有获得的数据都已存入公共存储库,并可供社区自由访问。这项研究代表了迄今为止规模最大的 基因组注释项目,为该模式生物提供了一个完整注释的基因组数据集,该模式生物是基础生物学、生物技术和工业的重要工具。