Stanke Mario, Tzvetkova Ana, Morgenstern Burkhard
Institut für Mikrobiologie und Genetik, Universität Göttingen, Goldschmidtstrasse, 37077 Göttingen, Germany.
Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.
A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.
AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.
AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.
存在大量用于人类基因组的基因预测程序。这些注释工具使用多种方法和数据源。在最近的ENCODE基因组注释评估项目(EGASP)中,一些最常用和最新开发的基因预测程序在来自人类基因组的测试数据上进行了系统评估和比较。AUGUSTUS是该项目中测试的工具之一。
AUGUSTUS可以用作从头开始的程序,即仅使用单个基因组序列作为输入信息的程序。此外,它能够将来自正在研究的基因组序列的信息与来自各种信息源的外部提示相结合。对于EGASP,我们使用基因组序列比对以及与表达序列标签(EST)和蛋白质序列的比对作为额外的信息源。在从头开始的程序类别中,AUGUSTUS正确预测的基因比任何其他从头开始的程序都要多得多。同时,在所有从头开始的程序中,它预测的假阳性基因数量最少,假阳性外显子数量也最少。当考虑额外的外部数据,如与EST、蛋白质和/或基因组序列的比对时,AUGUSTUS的准确性可以进一步提高。
AUGUSTUS被证明是测试工具中最准确的从头开始的基因发现工具。此外,它非常灵活,因为它可以同时考虑来自多个来源的信息。