Kanehisa Minoru, Goto Susumu, Kawashima Shuichi, Okuno Yasushi, Hattori Masahiro
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D277-80. doi: 10.1093/nar/gkh063.
A grand challenge in the post-genomic era is a complete computer representation of the cell and the organism, which will enable computational prediction of higher-level complexity of cellular processes and organism behavior from genomic information. Toward this end we have been developing a knowledge-based approach for network prediction, which is to predict, given a complete set of genes in the genome, the protein interaction networks that are responsible for various cellular processes. KEGG at http://www.genome.ad.jp/kegg/ is the reference knowledge base that integrates current knowledge on molecular interaction networks such as pathways and complexes (PATHWAY database), information about genes and proteins generated by genome projects (GENES/SSDB/KO databases) and information about biochemical compounds and reactions (COMPOUND/GLYCAN/REACTION databases). These three types of database actually represent three graph objects, called the protein network, the gene universe and the chemical universe. New efforts are being made to abstract knowledge, both computationally and manually, about ortholog clusters in the KO (KEGG Orthology) database, and to collect and analyze carbohydrate structures in the GLYCAN database.
后基因组时代的一个重大挑战是对细胞和生物体进行完整的计算机表示,这将能够根据基因组信息对细胞过程和生物体行为的更高层次复杂性进行计算预测。为此,我们一直在开发一种基于知识的网络预测方法,即给定基因组中的一组完整基因,预测负责各种细胞过程的蛋白质相互作用网络。位于http://www.genome.ad.jp/kegg/的KEGG是一个参考知识库,它整合了有关分子相互作用网络的当前知识,如通路和复合物(通路数据库)、基因组计划产生的有关基因和蛋白质的信息(基因/SSDB/KO数据库)以及有关生化化合物和反应的信息(化合物/聚糖/反应数据库)。这三种类型的数据库实际上代表了三个图对象,分别称为蛋白质网络、基因全域和化学全域。目前正在通过计算和人工的方式,对KO(KEGG直系同源)数据库中的直系同源簇知识进行抽象,并收集和分析聚糖数据库中的碳水化合物结构。