Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland.
BMC Bioinformatics. 2012 Sep 17;13:237. doi: 10.1186/1471-2105-13-237.
Yeasts are a model system for exploring eukaryotic genome evolution. Next-generation sequencing technologies are poised to vastly increase the number of yeast genome sequences, both from resequencing projects (population studies) and from de novo sequencing projects (new species). However, the annotation of genomes presents a major bottleneck for de novo projects, because it still relies on a process that is largely manual.
Here we present the Yeast Genome Annotation Pipeline (YGAP), an automated system designed specifically for new yeast genome sequences lacking transcriptome data. YGAP does automatic de novo annotation, exploiting homology and synteny information from other yeast species stored in the Yeast Gene Order Browser (YGOB) database. The basic premises underlying YGAP's approach are that data from other species already tells us what genes we should expect to find in any particular genomic region and that we should also expect that orthologous genes are likely to have similar intron/exon structures. Additionally, it is able to detect probable frameshift sequencing errors and can propose corrections for them. YGAP searches intelligently for introns, and detects tRNA genes and Ty-like elements.
In tests on Saccharomyces cerevisiae and on the genomes of Naumovozyma castellii and Tetrapisispora blattae newly sequenced with Roche-454 technology, YGAP outperformed another popular annotation program (AUGUSTUS). For S. cerevisiae and N. castellii, 91-93% of YGAP's predicted gene structures were identical to those in previous manually curated gene sets. YGAP has been implemented as a webserver with a user-friendly interface at http://wolfe.gen.tcd.ie/annotation.
酵母是探索真核生物基因组进化的模式系统。下一代测序技术有望极大地增加酵母基因组序列的数量,无论是来自重测序项目(群体研究)还是从头测序项目(新物种)。然而,基因组注释是从头测序项目的主要瓶颈,因为它仍然依赖于一个主要依靠人工的过程。
这里我们提出了酵母基因组注释管道(YGAP),这是一个专门为缺乏转录组数据的新酵母基因组序列设计的自动化系统。YGAP 进行自动的从头注释,利用来自其他酵母物种的同源性和共线性信息,这些信息存储在酵母基因顺序浏览器(YGOB)数据库中。YGAP 方法的基本前提是,来自其他物种的数据已经告诉我们,在任何特定的基因组区域,我们应该期望找到哪些基因,而且我们还应该期望,直系同源基因可能具有相似的内含子/外显子结构。此外,它还能够检测可能的移码测序错误,并为它们提出纠正方案。YGAP 智能地搜索内含子,并检测 tRNA 基因和 Ty 样元件。
在对酿酒酵母以及新用 Roche-454 技术测序的 Naumovozyma castellii 和 Tetrapisispora blattae 基因组的测试中,YGAP 优于另一个流行的注释程序(AUGUSTUS)。对于酿酒酵母和 N. castellii,YGAP 预测的基因结构中有 91-93%与以前手工注释的基因集相同。YGAP 已作为一个带有用户友好界面的网络服务器在 http://wolfe.gen.tcd.ie/annotation 上实现。