Wei Yingying
Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong.
Cancer Inform. 2015 May 14;14(Suppl 2):173-81. doi: 10.4137/CIN.S17303. eCollection 2015.
It has become increasingly common for large-scale public data repositories and clinical settings to have multiple types of data, including high-dimensional genomics, epigenomics, and proteomics data as well as survival data, measured simultaneously for the same group of biological samples, which provides unprecedented opportunities to understand cancer mechanisms from a more comprehensive scope and to develop new cancer therapies. Nevertheless, how to interpret a wealth of data into biologically and clinically meaningful information remains very challenging. In this paper, I review recent development in statistics for integrative analyses of cancer data. Topics will cover meta-analysis of homogeneous type of data across multiple studies, integrating multiple heterogeneous genomic data types, survival analysis with high-or ultrahigh-dimensional genomic profiles, and cross-data-type prediction where both predictors and responses are high-or ultrahigh-dimensional vectors. I compare existing statistical methods and comment on potential future research problems.
大规模公共数据存储库和临床环境中同时拥有多种类型的数据,包括高维基因组学、表观基因组学和蛋白质组学数据以及生存数据,且这些数据是针对同一组生物样本进行测量的,这种情况已变得越来越普遍,这为从更全面的角度理解癌症机制和开发新的癌症治疗方法提供了前所未有的机会。然而,如何将大量数据解释为具有生物学和临床意义的信息仍然极具挑战性。在本文中,我回顾了癌症数据综合分析统计学的最新进展。主题将涵盖跨多项研究的同类数据的荟萃分析、整合多种异质基因组数据类型、具有高维或超高维基因组概况的生存分析以及预测变量和响应变量均为高维或超高维向量的跨数据类型预测。我比较了现有的统计方法,并对潜在的未来研究问题进行了评论。