Gaasterland T, Sczyrba A, Thomas E, Aytekin-Kurban G, Gordon P, Sensen C W
The Rockefeller University, Laboratory of Computational Genomics, New York, New York 10021, USA.
Genome Res. 2000 Apr;10(4):502-10. doi: 10.1101/gr.10.4.502.
Our challenge in annotating the 2.91-Mb Adh region of the Drosophila melanogaster genome was to identify genetic and genomic features automatically, completely, and precisely within a 6-week period. To do so, we augmented the MAGPIE microbial genome annotation system to handle eukaryotic genomic sequence data. The new configuration required the integration of eukaryotic gene-finding tools and DNA repeat tools into the automatic data collection module. It also required us to define in MAGPIE new strategies to combine data about eukaryotic exon predictions with functional data to refine the exon predictions. At the heart of the resulting new eukaryotic genome annotation system is a reverse comparison of public protein and complementary DNA sequences against the input genome to identify missing exons and to refine exon boundaries. The software modules that add eukaryotic genome annotation capability to MAGPIE are available as EGRET (Eukaryotic Genome Rapid Evaluation Tool).
我们在注释黑腹果蝇基因组2.91兆碱基对的乙醇脱氢酶(Adh)区域时面临的挑战是,要在6周内自动、完整且精确地识别遗传和基因组特征。为此,我们扩展了MAGPIE微生物基因组注释系统,使其能够处理真核生物基因组序列数据。新的配置要求将真核生物基因发现工具和DNA重复序列工具整合到自动数据收集模块中。这还要求我们在MAGPIE中定义新的策略,将真核生物外显子预测数据与功能数据相结合,以优化外显子预测。由此产生的新的真核生物基因组注释系统的核心是将公共蛋白质和互补DNA序列与输入基因组进行反向比对,以识别缺失的外显子并优化外显子边界。为MAGPIE增添真核生物基因组注释功能的软件模块以EGRET(真核生物基因组快速评估工具)的形式提供。