Min Eun Jeong, Chi Eric C, Zhou Hua
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, 19104, PA, U.S.A.
Department of Statistics, North Carolina State University, Raleigh, 27695, NC, U.S.A.
Stat. 2020;8(1). doi: 10.1002/sta4.253. Epub 2020 Jan 2.
Canonical correlation analysis (CCA) is a multivariate analysis technique for estimating a linear relationship between two sets of measurements. Modern acquisition technologies, for example, those arising in neuroimaging and remote sensing, produce data in the form of multidimensional arrays or tensors. Classic CCA is not appropriate for dealing with tensor data due to the multidimensional structure and ultrahigh dimensionality of such modern data. In this paper, we present tensor CCA (TCCA) to discover relationships between two tensors while simultaneously preserving multidimensional structure of the tensors and utilizing substantially fewer parameters. Furthermore, we show how to employ a parsimonious covariance structure to gain additional stability and efficiency. We delineate population and sample problems for each model and propose efficient estimation algorithms with global convergence guarantees. Also we describe a probabilistic model for TCCA that enables the generation of synthetic data with desired canonical variates and correlations. Simulation studies illustrate the performance of our methods.
典型相关分析(CCA)是一种用于估计两组测量值之间线性关系的多元分析技术。现代采集技术,例如神经成像和遥感中出现的技术,会产生多维数组或张量形式的数据。由于此类现代数据的多维结构和超高维度,经典CCA不适用于处理张量数据。在本文中,我们提出了张量CCA(TCCA),以发现两个张量之间的关系,同时保留张量的多维结构并使用少得多的参数。此外,我们展示了如何采用简约协方差结构来获得额外的稳定性和效率。我们阐述了每个模型的总体和样本问题,并提出了具有全局收敛保证的高效估计算法。我们还描述了一种TCCA的概率模型,该模型能够生成具有所需典型变量和相关性的合成数据。模拟研究说明了我们方法的性能。