Clarke Robert, Ressom Habtom W, Wang Antai, Xuan Jianhua, Liu Minetta C, Gehan Edmund A, Wang Yue
Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, 3970 Reservoir Road NW, Washington, DC 20057, USA.
Nat Rev Cancer. 2008 Jan;8(1):37-49. doi: 10.1038/nrc2294.
High-throughput genomic and proteomic technologies are widely used in cancer research to build better predictive models of diagnosis, prognosis and therapy, to identify and characterize key signalling networks and to find new targets for drug development. These technologies present investigators with the task of extracting meaningful statistical and biological information from high-dimensional data spaces, wherein each sample is defined by hundreds or thousands of measurements, usually concurrently obtained. The properties of high dimensionality are often poorly understood or overlooked in data modelling and analysis. From the perspective of translational science, this Review discusses the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they can pose for data analysis and interpretation.
高通量基因组和蛋白质组技术在癌症研究中被广泛应用,以构建更好的诊断、预后和治疗预测模型,识别和表征关键信号网络,并寻找药物开发的新靶点。这些技术给研究人员带来了从高维数据空间中提取有意义的统计和生物学信息的任务,其中每个样本由成百上千次测量定义,这些测量通常是同时获得的。在数据建模和分析中,高维特性常常未被充分理解或被忽视。从转化科学的角度来看,本综述讨论了基因组和蛋白质组研究中出现的高维数据空间的特性,以及它们在数据分析和解释方面可能带来的挑战。