Stanke Mario, Diekhans Mark, Baertsch Robert, Haussler David
Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
Bioinformatics. 2008 Mar 1;24(5):637-44. doi: 10.1093/bioinformatics/btn013. Epub 2008 Jan 24.
Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes.
We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TransMap, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information.
AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu).
对基因组DNA中的蛋白质编码基因进行计算注释是分析新测序基因组时广泛使用的重要工具。然而,当前方法存在不准确的问题,并且对某些类型的基因效果不佳。纳入基因存在和结构的其他证据来源可以提高基因预测的质量。对于许多真核生物基因组,表达序列标签(EST)可作为基因的证据。已测序、注释并与目标基因组比对的相关基因组提供了基因存在和结构的证据。
我们将几种不同的证据来源整合到基因预测工具AUGUSTUS中。证据来源包括使用TransMap通过共线性映射到目标基因组的相关物种的基因和转录本注释、目标物种DNA、mRNA和EST的进化保守性以及反转座基因。预测结果包括在有证据支持的情况下的可变剪接变体。仅使用EST,我们能够在57%的人类基因中正确预测至少一种完全正确的剪接形式。同时使用来自其他物种和人类mRNA的证据,这一数字上升到77%。共线性映射非常适合注释与已注释基因组密切相关或有大量转录本证据的基因组。当比对用作复合信息而非独立的逐位置信息时,天然cDNA证据最有帮助。
AUGUSTUS是开源的,可从http://augustus.gobics.de获取。人类基因预测结果可在UCSC基因组浏览器(http://genome.ucsc.edu)上浏览和下载。