annot8r：EST数据集的基因本体论（GO）、酶委员会编号（EC）和京都基因与基因组百科全书（KEGG）注释

annot8r: GO, EC and KEGG annotation of EST datasets.

作者信息

Schmid Ralf, Blaxter Mark L

机构信息

Department of Biochemistry, University of Leicester, Lancaster Road, Leicester LE1 9HN, UK.

出版信息

BMC Bioinformatics. 2008 Apr 9;9:180. doi: 10.1186/1471-2105-9-180.

DOI:10.1186/1471-2105-9-180

PMID:18400082

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2324097/

Abstract

BACKGROUND

The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways.

RESULTS

annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools.

CONCLUSION

annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.

摘要

背景

对于没有完整测序基因组的物种，表达序列标签（EST）方法是生成序列数据的一个有吸引力的选择。对于没有主要基因组测序中心生物信息学基础设施的研究小组而言，此类数据集的注释和比较分析是一项艰巨的挑战。因此，需要用户友好的工具，以便使用定义明确的本体对非模式物种EST数据集进行注释，从而实现有意义的跨物种比较。为了解决这个问题，我们开发了annot8r，这是一个用于使用GO术语、EC编号和KEGG通路对EST数据集进行快速注释的平台。

结果

annot8r自动下载注释过程所需的所有文件，并生成一个参考数据库，该数据库存储UniProt条目、其相关的基因本体（GO）、酶委员会（EC）和京都基因与基因组百科全书（KEGG）注释以及其他相关数据。对于GO、EC和KEGG中的每一个，annot8r根据参考数据库中存储的信息从UniProt数据集中提取特定的序列子集。然后将这三个子集格式化用于BLAST搜索。用户提供要注释的蛋白质或核苷酸序列，annot8r针对这三个子集运行BLAST搜索。解析BLAST结果并从参考数据库中检索相应的注释。注释既保存为平面文件，也保存在关系型PostgreSQL结果数据库中，以便在结果中进行更高级的搜索。annot8r与EST分析工具的PartiGene套件集成。

结论

annot8r是一种能快速有效地为EST测序项目产生的数据集分配GO、EC和KEGG注释的工具。底层关系数据库的优势、灵活性和程序的易用性使其非常适合非模式物种的EST测序项目。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac10/2324097/698511d1abff/1471-2105-9-180-1.jpg

相似文献

annot8r: GO, EC and KEGG annotation of EST datasets.

BMC Bioinformatics. 2008 Apr 9;9:180. doi: 10.1186/1471-2105-9-180.

CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L.) methylation filtered genomic genespace sequences.

BMC Bioinformatics. 2007 Apr 19;8:129. doi: 10.1186/1471-2105-8-129.

In silico analysis of expressed sequence tags from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with conventional database searches.

BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-9-S1-S10.

ESTuber db: an online database for Tuber borchii EST sequences.

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2105-8-S1-S13.

GeneTools--application for functional annotation and statistical hypothesis testing.

BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.

Version VI of the ESTree db: an improved tool for peach transcriptome analysis.

BMC Bioinformatics. 2008 Mar 26;9 Suppl 2(Suppl 2):S9. doi: 10.1186/1471-2105-9-S2-S9.

Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary.

Bioinformatics. 2005 Oct 1;21(19):3787-93. doi: 10.1093/bioinformatics/bti430. Epub 2005 Apr 7.

GarlicESTdb: an online database and mining tool for garlic EST sequences.

BMC Plant Biol. 2009 May 18;9:61. doi: 10.1186/1471-2229-9-61.

The Diatom EST Database.

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D344-7. doi: 10.1093/nar/gki121.

EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration.

BMC Bioinformatics. 2008 Jan 7;9:5. doi: 10.1186/1471-2105-9-5.

引用本文的文献

Transcriptional landscape illustrates the diversified adaptation of medicinal plants to multifactorial stress combinations linked with high altitude.

Planta. 2025 Apr 15;261(5):111. doi: 10.1007/s00425-025-04686-1.

Gene regulation and signaling pathways in immune response to respiratory sensitizers: a database analysis.

Front Immunol. 2025 Mar 3;16:1470602. doi: 10.3389/fimmu.2025.1470602. eCollection 2025.

Proteotranscriptomics - A facilitator in omics research.

Comput Struct Biotechnol J. 2022 Jul 9;20:3667-3675. doi: 10.1016/j.csbj.2022.07.007. eCollection 2022.

The first draft genome of Picrorhiza kurrooa, an endangered medicinal herb from Himalayas.

Sci Rep. 2021 Jul 22;11(1):14944. doi: 10.1038/s41598-021-93495-z.

High-Density Genetic Map Construction and Identification of QTLs Controlling Leaf Abscission Trait in .

Int J Mol Sci. 2021 May 27;22(11):5723. doi: 10.3390/ijms22115723.

Comparative transcriptome analysis of Rheum australe, an endangered medicinal herb, growing in its natural habitat and those grown in controlled growth chambers.

Sci Rep. 2021 Feb 12;11(1):3702. doi: 10.1038/s41598-020-79020-8.

Identification and characterization of critical genes associated with tamoxifen resistance in breast cancer.

PeerJ. 2020 Dec 4;8:e10468. doi: 10.7717/peerj.10468. eCollection 2020.

Genomes and secretomes of Ascomycota fungi reveal diverse functions in plant biomass decomposition and pathogenesis.

BMC Genomics. 2019 Dec 12;20(1):976. doi: 10.1186/s12864-019-6358-x.

Differential transcript profiling alters regulatory gene expression during the development of Gossypium arboreum, G.stocksii and somatic hybrids.

Sci Rep. 2017 Jun 9;7(1):3120. doi: 10.1038/s41598-017-03431-3.

Genome-wide mapping and characterization of microsatellites in the swamp eel genome.

Sci Rep. 2017 Jun 9;7(1):3157. doi: 10.1038/s41598-017-03330-7.

本文引用的文献

The universal protein resource (UniProt).

Nucleic Acids Res. 2008 Jan;36(Database issue):D190-5. doi: 10.1093/nar/gkm895. Epub 2007 Nov 27.

ButterflyBase: a platform for lepidopteran genomics.

Nucleic Acids Res. 2008 Jan;36(Database issue):D582-7. doi: 10.1093/nar/gkm853. Epub 2007 Oct 12.

Profiling of maternal and developmental-stage specific mRNA transcripts in Atlantic halibut Hippoglossus hippoglossus.

Gene. 2007 Jan 15;386(1-2):202-10. doi: 10.1016/j.gene.2006.09.012. Epub 2006 Oct 5.

Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.

Bioinformatics. 2005 Sep 15;21(18):3674-6. doi: 10.1093/bioinformatics/bti610. Epub 2005 Aug 4.

Genomic tools and cDNA derived markers for butterflies.

Mol Ecol. 2005 Aug;14(9):2883-97. doi: 10.1111/j.1365-294X.2005.02609.x.

AutoFACT: an automatic functional annotation and classification tool.

BMC Bioinformatics. 2005 Jun 16;6:151. doi: 10.1186/1471-2105-6-151.

prot4EST: translating expressed sequence tags from neglected genomes.

BMC Bioinformatics. 2004 Nov 30;5:187. doi: 10.1186/1471-2105-5-187.

GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes.

BMC Bioinformatics. 2004 Nov 18;5:178. doi: 10.1186/1471-2105-5-178.

PartiGene--constructing partial genomes.

Bioinformatics. 2004 Jun 12;20(9):1398-404. doi: 10.1093/bioinformatics/bth101. Epub 2004 Feb 26.

NEMBASE: a resource for parasitic nematode ESTs.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D427-30. doi: 10.1093/nar/gkh018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

annot8r：EST数据集的基因本体论（GO）、酶委员会编号（EC）和京都基因与基因组百科全书（KEGG）注释

annot8r: GO, EC and KEGG annotation of EST datasets.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献