Waters Katrina M, Pounds Joel G, Thrall Brian D
Computational Sciences & Mathematics Division, Pacific Northwest National Laboratory, Mail Stop P7-56 Box 999, Richland WA 99352, USA.
Brief Funct Genomic Proteomic. 2006 Dec;5(4):261-72. doi: 10.1093/bfgp/ell019. Epub 2006 May 10.
The functioning of even a simple biological system is much more complicated than the sum of its genes, proteins and metabolites. A premise of systems biology is that molecular profiling will facilitate the discovery and characterization of important disease pathways. However, as multiple levels of effector pathway regulation appear to be the norm rather than the exception, a significant challenge presented by high-throughput genomics and proteomics technologies is the extraction of the biological implications of complex data. Thus, integration of heterogeneous types of data generated from diverse global technology platforms represents the first challenge in developing the necessary foundational databases needed for predictive modelling of cell and tissue responses. Given the apparent difficulty in defining the correspondence between gene expression and protein abundance measured in several systems to date, how do we make sense of these data and design the next experiment? In this review, we highlight current approaches and challenges associated with integration and analysis of heterogeneous data sets, focusing on global analysis obtained from high-throughput technologies.
即使是一个简单的生物系统,其功能也远比其基因、蛋白质和代谢产物的总和复杂得多。系统生物学的一个前提是,分子谱分析将有助于发现和表征重要的疾病途径。然而,由于效应器途径的多级调控似乎是常态而非例外,高通量基因组学和蛋白质组学技术带来的一个重大挑战是如何从复杂的数据中提取生物学意义。因此,整合来自不同全球技术平台产生的异构类型数据,是开发细胞和组织反应预测模型所需的必要基础数据库时面临的首要挑战。鉴于目前在几个系统中确定基因表达与蛋白质丰度之间对应关系存在明显困难,我们如何理解这些数据并设计下一个实验呢?在这篇综述中,我们重点介绍了与异构数据集整合和分析相关的当前方法及挑战,尤其关注从高通量技术获得的全局分析。