Suppr超能文献

EGASP中的AUGUSTUS:利用EST、蛋白质和基因组比对改进人类基因组中的基因预测

AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.

作者信息

Stanke Mario, Tzvetkova Ana, Morgenstern Burkhard

机构信息

Institut für Mikrobiologie und Genetik, Universität Göttingen, Goldschmidtstrasse, 37077 Göttingen, Germany.

出版信息

Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.

Abstract

BACKGROUND

A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project.

RESULTS

AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account.

CONCLUSION

AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.

摘要

背景

存在大量用于人类基因组的基因预测程序。这些注释工具使用多种方法和数据源。在最近的ENCODE基因组注释评估项目(EGASP)中,一些最常用和最新开发的基因预测程序在来自人类基因组的测试数据上进行了系统评估和比较。AUGUSTUS是该项目中测试的工具之一。

结果

AUGUSTUS可以用作从头开始的程序,即仅使用单个基因组序列作为输入信息的程序。此外,它能够将来自正在研究的基因组序列的信息与来自各种信息源的外部提示相结合。对于EGASP,我们使用基因组序列比对以及与表达序列标签(EST)和蛋白质序列的比对作为额外的信息源。在从头开始的程序类别中,AUGUSTUS正确预测的基因比任何其他从头开始的程序都要多得多。同时,在所有从头开始的程序中,它预测的假阳性基因数量最少,假阳性外显子数量也最少。当考虑额外的外部数据,如与EST、蛋白质和/或基因组序列的比对时,AUGUSTUS的准确性可以进一步提高。

结论

AUGUSTUS被证明是测试工具中最准确的从头开始的基因发现工具。此外,它非常灵活,因为它可以同时考虑来自多个来源的信息。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验