Kanehisa Minoru, Limviphuvadh Vachiranee, Tanabe Mao
The large-scale datasets generated by gene sequencing, proteomics, and other high-throughput experimental technologies are the bases for understanding life as a molecular system and for developing medical, industrial, and other practical applications. In order to facilitate bioinformatics analysis of such large-scale datasets, it is essential to organize our knowledge on higher levels of systemic functions in a computable form, so that it can be used as a reference for inferring molecular systems from the information contained in the building blocks. Thus, we have been developing the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (http://www.genome.jp/kegg/), an integrated resource of about 20 databases (1). The main component is the KEGG PATHWAY database, consisting of manually drawn graphical diagrams of molecular networks, called and representing various cellular processes and organism behaviors. KEGG PATHWAY is a reference database for pathway mapping, which is the process to match, for example, a genomic or transcriptomic content of genes against KEGG reference pathway maps to infer systemic functions of the cell or the organism. As part of the KEGG PATHWAY database, we organize disease pathway maps representing our knowledge of causative genes and molecular networks related to them for human diseases, including cancers, immune disorders, neurodegenerative diseases, metabolic disorders, and infectious diseases. Here we focus on neurodegenerative diseases, which were among the first to be made available on the KEGG PATHWAY database. A diverse range of neurodegenerative diseases is commonly characterized by the accumulation of abnormal protein aggregates. Causative genes, including those that produce abnormal proteins, have been identified in various neurodegenerative diseases. The current information is not sufficient to find common molecular mechanisms of the diseases. In this chapter we first present an overview of KEGG, including the KEGG DISEASE and KEGG DRUG databases, and describe the KEGG PATHWAY maps for six neurodegenerative diseases: Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), Huntington’s disease (HD), dentatorubropallidoluysian atrophy (DRPLA), and prion diseases (PRION). We then present bioinformatics analysis to combine and expand these pathway maps toward identification of common proteins and common interactions, which may lead to a better understanding of common molecular pathogenic mechanisms (2).
基因测序、蛋白质组学和其他高通量实验技术生成的大规模数据集,是将生命理解为分子系统以及开发医学、工业和其他实际应用的基础。为便于对这类大规模数据集进行生物信息学分析,必须以可计算的形式在更高层次的系统功能上组织我们的知识,以便将其用作从构成要素所含信息推断分子系统的参考。因此,我们一直在开发KEGG(京都基因与基因组百科全书)数据库(http://www.genome.jp/kegg/),它是一个包含约20个数据库的综合资源库(1)。其主要组成部分是KEGG PATHWAY数据库,由人工绘制的分子网络图形图表组成,称为通路图,代表各种细胞过程和生物体行为。KEGG PATHWAY是通路映射的参考数据库,通路映射是将例如基因的基因组或转录组内容与KEGG参考通路图进行匹配以推断细胞或生物体系统功能的过程。作为KEGG PATHWAY数据库的一部分,我们整理了疾病通路图,这些图代表了我们对人类疾病(包括癌症、免疫紊乱、神经退行性疾病、代谢紊乱和传染病)的致病基因及其相关分子网络的认识。在这里,我们重点关注神经退行性疾病,这些疾病是最早在KEGG PATHWAY数据库中提供的疾病之一。多种神经退行性疾病通常的特征是异常蛋白质聚集体的积累。在各种神经退行性疾病中已经鉴定出致病基因,包括那些产生异常蛋白质的基因。目前的信息不足以找到这些疾病的共同分子机制。在本章中,我们首先概述KEGG,包括KEGG DISEASE和KEGG DRUG数据库,并描述六种神经退行性疾病的KEGG PATHWAY图:阿尔茨海默病(AD)、帕金森病(PD)、肌萎缩侧索硬化症(ALS)、亨廷顿舞蹈病(HD)、齿状核红核苍白球路易体萎缩症(DRPLA)和朊病毒病(PRION)。然后我们进行生物信息学分析,将这些通路图进行合并和扩展,以识别共同的蛋白质和共同的相互作用,这可能有助于更好地理解共同的分子致病机制(2)。