Lock Eric F, Hoadley Katherine A, Marron J S, Nobel Andrew B
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.
Ann Appl Stat. 2013 Mar 1;7(1):523-542. doi: 10.1214/12-AOAS597.
Research in several fields now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types.
现在,多个领域的研究都需要分析数据集,其中针对同一组对象可以获得多种高维数据类型。特别是,癌症基因组图谱(TCGA)包含来自多种不同基因组技术的关于同一癌性肿瘤样本的数据。在本文中,我们引入了联合与个体变异解释(JIVE),这是一种用于对此类数据集进行综合分析的变异通用分解方法。该分解由三个项组成:一个捕获跨数据类型联合变异的低秩近似、针对每种数据类型特有的结构化变异的低秩近似以及残余噪声。JIVE量化了数据类型之间的联合变异量,降低了数据的维度,并为联合和个体结构的可视化探索提供了新方向。所提出的方法是主成分分析的扩展,并且相对于典型相关分析和偏最小二乘等流行的双块方法具有明显优势。对多形性胶质母细胞瘤肿瘤样本的基因表达和miRNA数据进行的JIVE分析揭示了基因 - miRNA关联,并更好地刻画了肿瘤类型。