真核基因、假基因和启动子的自动注释

Automatic annotation of eukaryotic genes, pseudogenes and promoters.

作者信息

Solovyev Victor, Kosarev Peter, Seledsov Igor, Vorobyev Denis

机构信息

Department of Computer Science, Royal Holloway, University of London, Egham, Surrey TW20 0EX, UK.

出版信息

Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

DOI:10.1186/gb-2006-7-s1-s10

PMID:16925832

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1810547/

Abstract

BACKGROUND

The ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce the ENCODE-HAVANA annotation.

RESULTS

The Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones. The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software.

CONCLUSION

We review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.

摘要

背景

已组织开展ENCODE基因预测研讨会（EGASP），以评估最先进的自动基因发现方法在多大程度上能够重现人类基因组的人工和实验基因注释。我们使用Softberry基因发现软件来预测44条选定的ENCODE序列中的基因、假基因和启动子，这些序列约占人类基因组的1%（30兆碱基）。根据基因发现程序重现ENCODE - HAVANA注释的能力对其预测结果进行评估。

结果

Fgenesh++基因预测流程能够识别91%的编码核苷酸，特异性为90%。我们的自动假基因发现工具（PSF程序）找到了90%的人工注释假基因以及一些新的假基因。Fprom启动子预测程序能够识别80%的TATA启动子序列，每2000个碱基对（bp）有一个假阳性预测，对于无TATA框启动子，识别率为50%，每650 bp有一个假阳性预测。它可用于识别基因预测软件所发现基因的注释编码部分上游的转录起始位点。

结论

我们回顾了用于识别这三种重要的基因组结构和功能元件的软件及基础方法，并讨论了预测的准确性、注释基因组序列方面的最新进展和未解决的问题。我们已经证明，我们的方法可有效地用于真核基因组的初始自动注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41df/1810547/8bdf2efd9f9f/gb-2006-7-s1-s10-1.jpg

相似文献

Automatic annotation of eukaryotic genes, pseudogenes and promoters.真核基因、假基因和启动子的自动注释

Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.在EGASP实验中对ENCODE区域的启动子预测进行性能评估。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7.

GENCODE: producing a reference annotation for ENCODE.GENCODE：为ENCODE生成参考注释。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.

AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS：利用EST、蛋白质和基因组比对改进人类基因组中的基因预测

Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.

Ab initio gene finding in Drosophila genomic DNA.在果蝇基因组DNA中进行从头基因预测。

Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.

GENCODE pseudogenes.GENCODE假基因

Methods Mol Biol. 2014;1167:129-55. doi: 10.1007/978-1-4939-0835-6_10.

Systematic identification of pseudogenes through whole genome expression evidence profiling.通过全基因组表达证据分析系统鉴定假基因。

Nucleic Acids Res. 2006;34(16):4477-85. doi: 10.1093/nar/gkl591. Epub 2006 Aug 31.

AceView: a comprehensive cDNA-supported gene and transcripts annotation.AceView：一个由cDNA支持的全面的基因和转录本注释。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7.

PromH: Promoters identification using orthologous genomic sequences.PromH：利用直系同源基因组序列进行启动子识别。

Nucleic Acids Res. 2003 Jul 1;31(13):3540-5. doi: 10.1093/nar/gkg525.

EAnnot: a genome annotation tool using experimental evidence.EAnnot：一种利用实验证据的基因组注释工具。

Genome Res. 2004 Dec;14(12):2503-9. doi: 10.1101/gr.3152604.

引用本文的文献

Characterization of odorant binding and chemosensory protein families in the kissing bug Triatoma infestans: comparative analysis among Heteroptera species.亲吻虫（南美锥蝽）中气味结合蛋白和化学感受蛋白家族的特征：异翅目物种间的比较分析

BMC Genomics. 2025 Sep 1;26(1):794. doi: 10.1186/s12864-025-11967-2.

Comparative genome analysis of patulin-producing OM1 isolated from pears.从梨中分离出的产棒曲霉素的OM1的比较基因组分析。

PeerJ. 2025 Aug 22;13:e19848. doi: 10.7717/peerj.19848. eCollection 2025.

Genomic and Functional Characterization of Multidrug-Resistant : Insights into Resistome, Virulome, and Signaling Systems.多重耐药的基因组和功能特征：对耐药组、毒力组和信号系统的见解

Antibiotics (Basel). 2025 Jun 30;14(7):667. doi: 10.3390/antibiotics14070667.

Searching for genes determining the APR phenotype in rye.寻找决定黑麦过敏反应表型的基因。

BMC Plant Biol. 2025 Jul 19;25(1):935. doi: 10.1186/s12870-025-06920-0.

Enzymatic twists evolved stereo-divergent alkaloids in the Solanaceae family.酶促转化在茄科植物中产生了立体发散性生物碱。

Nat Commun. 2025 Jun 18;16(1):5341. doi: 10.1038/s41467-025-59290-4.

Identification of a gene conferring broad-spectrum orthotospovirus resistance in Solanaceae.茄科中一个赋予对正番茄斑萎病毒广谱抗性的基因的鉴定。

Sci Adv. 2025 Jun 20;11(25):eadw4333. doi: 10.1126/sciadv.adw4333. Epub 2025 Jun 18.

PD-1 is conserved from sharks to humans: new insights into PD-1, PD-L1, PD-L2, and SHP-2 evolution.从鲨鱼到人类，程序性死亡受体1（PD-1）具有保守性：对PD-1、程序性死亡配体1（PD-L1）、程序性死亡配体2（PD-L2）和含Src同源2结构域蛋白磷酸酶2（SHP-2）进化的新见解

Front Immunol. 2025 May 28;16:1573492. doi: 10.3389/fimmu.2025.1573492. eCollection 2025.

An atypical NLR pair TdCNL1/TdCNL5 from wild emmer confers powdery mildew resistance in wheat.来自野生二粒小麦的非典型NLR对TdCNL1/TdCNL5赋予小麦对白粉病的抗性。

Nat Genet. 2025 Jun;57(6):1553-1562. doi: 10.1038/s41588-025-02208-z. Epub 2025 Jun 9.

Chromosome-level reference genome for the medically important Arabian horned viper (Cerastes gasperettii).具有医学重要性的阿拉伯角蝰（Cerastes gasperettii）的染色体水平参考基因组。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf030.

Pinus radiata genome reveals a downward demographic trajectory and opportunities for genomics-assisted breeding.辐射松基因组揭示了种群数量下降的轨迹以及基因组辅助育种的机遇。

G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf125.

本文引用的文献

EGASP: collaboration through competition to find human genes.EGASP：通过竞争合作来寻找人类基因。

Nat Methods. 2005 Aug;2(8):575-7. doi: 10.1038/nmeth0805-575.

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.NCBI参考序列（RefSeq）：一个经过整理的基因组、转录本和蛋白质的非冗余序列数据库。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. doi: 10.1093/nar/gki025.

The ENCODE (ENCyclopedia Of DNA Elements) Project.DNA 元件百科全书（ENCODE）计划

Science. 2004 Oct 22;306(5696):636-40. doi: 10.1126/science.1105136.

A vision for the future of genomics research.基因组学研究的未来愿景。

Nature. 2003 Apr 24;422(6934):835-47. doi: 10.1038/nature01626. Epub 2003 Apr 14.

Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters.龙启动子查找器：脊椎动物RNA聚合酶II启动子的识别

Bioinformatics. 2002 Jan;18(1):198-9. doi: 10.1093/bioinformatics/18.1.198.

First pass annotation of promoters on human chromosome 22.人类22号染色体上启动子的首过注释

Genome Res. 2001 Mar;11(3):333-40. doi: 10.1101/gr.154601.

Using GeneWise in the Drosophila annotation experiment.在果蝇注释实验中使用GeneWise。

Genome Res. 2000 Apr;10(4):547-8. doi: 10.1101/gr.10.4.547.

Ab initio gene finding in Drosophila genomic DNA.在果蝇基因组DNA中进行从头基因预测。

Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.

The eukaryotic promoter database (EPD).真核生物启动子数据库（EPD）。

Nucleic Acids Res. 2000 Jan 1;28(1):302-3. doi: 10.1093/nar/28.1.302.

GenBank.基因银行

Nucleic Acids Res. 1999 Jan 1;27(1):12-7. doi: 10.1093/nar/27.1.12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

真核基因、假基因和启动子的自动注释

Automatic annotation of eukaryotic genes, pseudogenes and promoters.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献