Mao Xizeng, Cai Tao, Olyarchuk John G, Wei Liping
Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University Beijing 100871, PR China.
Bioinformatics. 2005 Oct 1;21(19):3787-93. doi: 10.1093/bioinformatics/bti430. Epub 2005 Apr 7.
High-throughput technologies such as DNA sequencing and microarrays have created the need for automated annotation of large sets of genes, including whole genomes, and automated identification of pathways. Ontologies, such as the popular Gene Ontology (GO), provide a common controlled vocabulary for these types of automated analysis. Yet, while GO offers tremendous value, it also has certain limitations such as the lack of direct association with pathways.
We demonstrated the use of the KEGG Orthology (KO), part of the KEGG suite of resources, as an alternative controlled vocabulary for automated annotation and pathway identification. We developed a KO-Based Annotation System (KOBAS) that can automatically annotate a set of sequences with KO terms and identify both the most frequent and the statistically significantly enriched pathways. Results from both whole genome and microarray gene cluster annotations with KOBAS are comparable and complementary to known annotations. KOBAS is a freely available stand-alone Python program that can contribute significantly to genome annotation and microarray analysis.
DNA测序和微阵列等高通量技术催生了对包括全基因组在内的大量基因进行自动注释以及自动识别通路的需求。本体,如广为人知的基因本体(GO),为这类自动分析提供了一个通用的受控词汇表。然而,尽管GO具有巨大价值,但它也存在某些局限性,比如缺乏与通路的直接关联。
我们展示了使用KEGG直系同源关系(KO)(KEGG资源套件的一部分)作为自动注释和通路识别的替代受控词汇表。我们开发了一个基于KO的注释系统(KOBAS),它可以用KO术语自动注释一组序列,并识别最频繁出现和统计学上显著富集的通路。使用KOBAS对全基因组和微阵列基因簇进行注释的结果与已知注释具有可比性且互为补充。KOBAS是一个免费的独立Python程序,可为基因组注释和微阵列分析做出重大贡献。