Wu Cathy H, Huang Hongzhan, Nikolskaya Anastasia, Hu Zhanghi, Barker Winona C
Georgetown University Medical Center, Washington, DC 20057-1455, USA.
Comput Biol Chem. 2004 Feb;28(1):87-96. doi: 10.1016/j.compbiolchem.2003.10.003.
Increasingly, scientists have begun to tackle gene functions and other complex regulatory processes by studying organisms at the global scales for various levels of biological organization, ranging from genomes to metabolomes and physiomes. Meanwhile, new bioinformatics methods have been developed for inferring protein function using associative analysis of functional properties to complement the traditional sequence homology-based methods. To fully exploit the value of the high-throughput system biology data and to facilitate protein functional studies requires bioinformatics infrastructures that support both data integration and associative analysis. The iProClass database, designed to serve as a framework for data integration in a distributed networking environment, provides comprehensive descriptions of all proteins, with rich links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy. In particular, the database is organized with PIRSF family classification and maps to other family, function, and structure classification schemes. Coupled with the underlying taxonomic information for complete genomes, the iProClass system (http://pir.georgetown.edu/iproclass/) supports associative studies of protein family, domain, function, and structure. A case study of the phosphoglycerate mutases illustrates a systematic approach for protein family and phylogenetic analysis. Such studies may serve as a basis for further analysis of protein functional evolution, and its relationship to the co-evolution of metabolic pathways, cellular networks, and organisms.
科学家们越来越多地开始通过在全球范围内研究从基因组到代谢组和生理组等不同生物组织层次的生物体,来攻克基因功能和其他复杂的调控过程。与此同时,已经开发出了新的生物信息学方法,利用功能特性的关联分析来推断蛋白质功能,以补充传统的基于序列同源性的方法。为了充分利用高通量系统生物学数据的价值并促进蛋白质功能研究,需要支持数据整合和关联分析的生物信息学基础设施。iProClass数据库旨在作为分布式网络环境中数据整合的框架,提供所有蛋白质的全面描述,并与50多个蛋白质家族、功能、途径、相互作用、修饰、结构、基因组、本体、文献和分类学数据库建立丰富链接。特别是,该数据库按照PIRSF家族分类进行组织,并映射到其他家族、功能和结构分类方案。结合完整基因组的基础分类信息,iProClass系统(http://pir.georgetown.edu/iproclass/)支持对蛋白质家族、结构域、功能和结构的关联研究。磷酸甘油酸变位酶的案例研究说明了蛋白质家族和系统发育分析的系统方法。此类研究可为进一步分析蛋白质功能进化及其与代谢途径、细胞网络和生物体共同进化的关系奠定基础。