Dietrich Fred S, Magwene Paul, McCusker John
Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710 USA.
Department of Biology, Duke University, Durham, NC 27710 USA.
bioRxiv. 2025 Apr 30:2023.09.07.545205. doi: 10.1101/2023.09.07.545205.
Examination of the genome sequence of strain S288c and 93 additional diverse strains allows identification of the 5885 genes that make up the core set of genes in this species and gives a better sense of the organization and plasticity of this genome. strains each contain dozens to hundreds of strain-specific genes. In addition to a variable content of retrotransposons Ty1-Ty6, some strains contain a novel transposable element, Ty7. Examination further shows that some annotated putative protein coding genes are likely artifacts. We propose altering approximately 5% of the current annotations in the widely used reference strain S288c. Potential null alleles are common and found in all 94 strains examined, with these potential null alleles typically containing a single stop codon or frameshift. There are also gene remnants, pseudogenes, and variable arrays of genes. Among the core genes there are now only 364 protein coding genes of unknown function, classified as uncharacterized in the Saccharomyces Genome Database. This work suggests that there is a role for carefully edited and annotated genome sequences in understanding the genome organization and content of a species. We propose that gene remnants be added to the repertoire of features found in the genome, and likely other fungal species.
对S288c菌株和另外93种不同菌株的基因组序列进行检测,能够识别出构成该物种核心基因集的5885个基因,从而更好地了解该基因组的组织和可塑性。每个菌株都包含数十到数百个菌株特异性基因。除了反转录转座子Ty1-Ty6的可变含量外,一些菌株还含有一种新型转座元件Ty7。进一步检测表明,一些注释的假定蛋白质编码基因可能是人为产物。我们建议对广泛使用的参考菌株S288c中约5%的当前注释进行修改。潜在的无效等位基因很常见,在所检测的所有94个菌株中都能找到,这些潜在的无效等位基因通常包含一个终止密码子或移码突变。此外还有基因残余、假基因和可变的基因阵列。在核心基因中,目前只有364个功能未知的蛋白质编码基因,在酿酒酵母基因组数据库中被归类为未表征基因。这项工作表明,经过仔细编辑和注释的基因组序列对于理解一个物种的基因组组织和内容具有重要作用。我们建议将基因残余添加到基因组以及可能其他真菌物种中发现的特征库中。