Louis Ed
Institute of Genetics, Queens Medical Centre, University of Nottingham, Nottingham, UK.
Methods Mol Biol. 2011;759:31-40. doi: 10.1007/978-1-61779-173-4_2.
In the early days of the yeast genome sequencing project, gene annotation was in its infancy and suffered the problem of many false positive annotations as well as missed genes. The lack of other sequences for comparison also prevented the annotation of conserved, functional sequences that were not coding. We are now in an era of comparative genomics where many closely related as well as more distantly related genomes are available for direct sequence and synteny comparisons allowing for more probable predictions of genes and other functional sequences due to conservation. We also have a plethora of functional genomics data which helps inform gene annotation for previously uncharacterised open reading frames (ORFs)/genes. For Saccharomyces cerevisiae this has resulted in a continuous updating of the gene and functional sequence annotations in the reference genome helping it retain its position as the best characterized eukaryotic organism's genome. A single reference genome for a species does not accurately describe the species and this is quite clear in the case of S. cerevisiae where the reference strain is not ideal for brewing or baking due to missing genes. Recent surveys of numerous isolates, from a variety of sources, using a variety of technologies have revealed a great deal of variation amongst isolates with genome sequence surveys providing information on novel genes, undetectable by other means. We now have a better understanding of the extant variation in S. cerevisiae as a species as well as some idea of how much we are missing from this understanding. As with gene annotation, comparative genomics enhances the discovery and description of genome variation and is providing us with the tools for understanding genome evolution, adaptation and selection, and underlying genetics of complex traits.
在酵母基因组测序项目的早期,基因注释尚处于起步阶段,存在许多假阳性注释以及基因遗漏的问题。缺乏其他可供比较的序列也阻碍了对非编码保守功能序列的注释。我们现在正处于比较基因组学时代,有许多亲缘关系较近以及较远的基因组可用于直接的序列和共线性比较,由于序列保守性,这使得对基因和其他功能序列的预测更具可能性。我们还拥有大量的功能基因组学数据,这有助于为以前未表征的开放阅读框(ORF)/基因进行基因注释。对于酿酒酵母来说,这导致了参考基因组中基因和功能序列注释的不断更新,有助于其保持作为特征最明确的真核生物基因组的地位。一个物种的单一参考基因组并不能准确描述该物种,这在酿酒酵母的例子中很明显,其参考菌株由于缺少某些基因,并不适合用于酿造或烘焙。最近,利用各种技术对来自不同来源的大量分离株进行的调查揭示了分离株之间存在大量变异,基因组序列调查提供了通过其他方法无法检测到的新基因信息。我们现在对酿酒酵母作为一个物种的现存变异有了更好地理解,也对我们在这方面的认识缺失程度有了一些概念。与基因注释一样,比较基因组学增强了对基因组变异的发现和描述,并为我们提供了理解基因组进化、适应和选择以及复杂性状潜在遗传学的工具。