Perco Paul, Rapberger Ronald, Siehs Christian, Lukas Arno, Oberbauer Rainer, Mayer Gert, Mayer Bernd
Department of Nephrology, Medical University of Vienna, Austria.
Electrophoresis. 2006 Jul;27(13):2659-75. doi: 10.1002/elps.200600064.
Differential gene expression analysis and proteomics have exerted significant impact on the elucidation of concerted cellular processes, as simultaneous measurement of hundreds to thousands of individual objects on the level of RNA and protein ensembles became technically feasible. The availability of such data sets has promised a profound understanding of phenomena on an aggregate level, expressed as the phenotypic response (observables) of cells, e.g., in the presence of drugs, or characterization of cells and tissue displaying distinct patho-physiological states. However, the step of transforming these data into context, i.e., linking distinct expression or abundance patterns with phenotypic observables - and furthermore enabling a sound biological interpretation on the level of reaction networks and concerted pathways, is still a major shortcoming. This finding is certainly based on the enormous complexity embedded in cellular reaction networks, but a variety of computational approaches have been developed over the last few years to overcome these issues. This review provides an overview on computational procedures for analysis of genomic and proteomic data introducing a sequential analysis workflow: Explorative statistics for deriving a first, from the purely statistical viewpoint, relevant candidate gene/protein list, followed by co-regulation and network analysis to biologically expand this core list toward functional networks and pathways. The review on these procedures is complemented by example applications tailored at identification of disease-associated proteins. Optimization of computational procedures involved, in conjunction with the continuous increase in additional biological data, clearly has the potential of boosting our understanding of processes on a cell-wide level.
差异基因表达分析和蛋白质组学对阐明协同细胞过程产生了重大影响,因为在RNA和蛋白质集合水平上同时测量数百到数千个个体对象在技术上已变得可行。此类数据集的可用性有望在总体水平上深入理解各种现象,这些现象表现为细胞的表型反应(可观测值),例如在药物存在的情况下,或者对显示不同病理生理状态的细胞和组织进行表征。然而,将这些数据转化为背景信息的步骤,即把不同的表达或丰度模式与表型可观测值联系起来——进而在反应网络和协同途径水平上进行合理的生物学解释,仍然是一个主要缺陷。这一发现当然是基于细胞反应网络中固有的巨大复杂性,但在过去几年中已经开发出了多种计算方法来克服这些问题。本综述概述了用于分析基因组和蛋白质组数据的计算程序,介绍了一种顺序分析工作流程:探索性统计,从纯粹的统计角度得出第一个相关的候选基因/蛋白质列表,随后进行共调控和网络分析,以便从生物学角度将这个核心列表扩展到功能网络和途径。对这些程序的综述辅以针对疾病相关蛋白质鉴定的示例应用。所涉及的计算程序的优化,连同其他生物学数据的不断增加,显然有潜力提升我们对全细胞水平过程的理解。