KEGG Orthology(KO):一个大规模的基于分类学的直系同源簇自动构建方法。

KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters.

机构信息

Center for Transdisciplinary Research, Niigata University, 1-757 Asahimachi-dori, Chuo-ku, Niigata 951-8585, Japan.

出版信息

Nucleic Acids Res. 2013 Jan;41(Database issue):D353-7. doi: 10.1093/nar/gks1239. Epub 2012 Nov 27.

Abstract

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.

摘要

在越来越多的全序列基因组中鉴定直系同源基因是近年来基因组科学面临的一个挑战。在这里,我们介绍了 KEGG OC(http://www.genome.jp/tools/oc/),这是一个新的直系同源簇(OC)数据库。KEGG OC 的当前版本包含 1176030 个 OC,这些 OC 是通过将 2112 个完整基因组(153 个真核生物、1830 个细菌和 129 个古菌)中的 8357175 个基因聚类获得的。OC 是通过将所有完整基因组中的所有可能的蛋白质编码基因基于其氨基酸序列相似性应用基于拟准簇的聚类方法构建的。计算 OC 是高效的,这使得内容可以定期更新。KEGG OC 具有以下两个特点:(i)它由来自生命三个领域的各种生物体的所有完整基因组组成,并且其生物体数量在现有的数据库中是最大的;(ii)它与 KEGG 数据库兼容,共享相同的基因和标识符集,这使得 OC 可以与 KEGG 中的有用组件(如生物途径、途径模块、功能层次结构、疾病和药物)无缝集成。KEGG OC 资源可通过 OC 查看器访问,该查看器提供了在不同分类水平上的 OC 的交互式可视化。

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索