Kanehisa Minoru
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
Genome Inform. 2009 Oct;23(1):212-3.
Twenty years ago the Human Genome Project was initiated aiming to uncover the genetic factors of human diseases and to develop new strategies for diagnosis, treatment, and prevention. Despite the successful sequencing of the human genome and the discovery of many disease related genes, our understanding of molecular mechanisms is still largely incomplete for the majority of diseases. In the KEGG database project we have been organizing our knowledge on cellular functions and organism behaviors in computable forms, especially in the forms of molecular networks (KEGG pathway maps) and hierarchical lists (BRITE functional hierarchies). The computerized knowledge has been widely used as a reference for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies. Our efforts are now focused on human diseases and drugs. We consider diseases as perturbed states of the molecular system that operates the cell and the organism, and drugs as perturbants to the molecular system. Since the existing disease databases are mostly for humans to read and understand, we develop a more computable disease information resource where our knowledge on diseases is represented as molecular networks or gene/molecule lists. When the detail of the molecular system is relatively well characterized, we use the molecular network representation and draw KEGG pathway maps. The Human Diseases category of the KEGG PATHWAY database contains about 40 pathway maps for cancers, immune disorders, neurodegenerative diseases, etc. When the detail is not known but disease genes are identified, we use the gene/molecule list representation and create a KEGG DISEASE entry. The entry contains a list of known disease genes and other relevant molecules including environmental factors, diagnostic markers, and therapeutic drugs. The list simply defines the membership to the underlying molecular system, but is still useful for computational analysis. In the KEGG DRUG database we capture knowledge on two types of molecular networks. One is the interaction network of drugs with target molecules, metabolizing enzymes, transporters, other drugs, and the pathways involving all these molecules. The other is the chemical structure transformation network representing the biosynthetic pathways of natural products in various organisms, as well as the history of drug development where drug structures have been continuously modified by medicinal chemists. KEGG DRUG contains chemical structures and/or chemical components of all prescription and OTC drugs in Japan including crude drugs and TCM (Traditional Chinese Medicine) formulas, as well as most prescription drugs in USA and many prescription drugs in Europe. I will report on our strategy to analyze the chemical architecture of natural products derived from enzymatic reactions (and enzyme genes) and the chemical architecture of marketed drugs derived from human made organic reactions in the history of drug development, towards drug discovery from the genomes of plants and microorganisms.
二十年前启动了人类基因组计划,旨在揭示人类疾病的遗传因素,并开发诊断、治疗和预防的新策略。尽管人类基因组测序取得成功,且发现了许多与疾病相关的基因,但对于大多数疾病,我们对其分子机制的理解仍有很大欠缺。在KEGG数据库项目中,我们一直以可计算的形式整理有关细胞功能和生物体行为的知识,特别是以分子网络(KEGG通路图)和层次列表(BRITE功能层次结构)的形式。这些计算机化的知识已被广泛用作对测序和其他高通量实验技术产生的大规模数据集进行生物学解释的参考。我们目前的工作重点是人类疾病和药物。我们将疾病视为操作细胞和生物体的分子系统的扰动状态,将药物视为对分子系统的扰动因素。由于现有的疾病数据库大多供人阅读和理解,我们开发了一种更具可计算性的疾病信息资源,其中我们关于疾病的知识以分子网络或基因/分子列表的形式呈现。当分子系统的细节相对清楚时,我们使用分子网络表示法并绘制KEGG通路图。KEGG通路数据库的人类疾病类别包含约40张针对癌症、免疫紊乱、神经退行性疾病等的通路图。当细节未知但疾病基因已被鉴定时,我们使用基因/分子列表表示法并创建一个KEGG疾病条目。该条目包含已知疾病基因和其他相关分子的列表,包括环境因素、诊断标志物和治疗药物。该列表简单定义了基础分子系统的成员,但仍有助于进行计算分析。在KEGG药物数据库中,我们获取了两种分子网络的知识。一种是药物与靶分子、代谢酶、转运蛋白、其他药物以及涉及所有这些分子的通路的相互作用网络。另一种是化学结构转化网络,它代表了各种生物体中天然产物的生物合成途径,以及药物开发的历史,在这个过程中药物结构不断被药物化学家修饰。KEGG药物数据库包含日本所有处方药和非处方药的化学结构和/或化学成分,包括天然药物和中药配方,以及美国的大多数处方药和欧洲的许多处方药。我将报告我们在药物开发历史中分析源自酶促反应(及酶基因)的天然产物的化学结构以及源自人工有机反应的市售药物的化学结构的策略,以从植物和微生物基因组中发现药物。