Mai Qing, Zhang Xin
Department of Statistics, Florida State University, Tallahassee, Florida.
Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.
It is increasingly interesting to model the relationship between two sets of high-dimensional measurements with potentially high correlations. Canonical correlation analysis (CCA) is a classical tool that explores the dependency of two multivariate random variables and extracts canonical pairs of highly correlated linear combinations. Driven by applications in genomics, text mining, and imaging research, among others, many recent studies generalize CCA to high-dimensional settings. However, most of them either rely on strong assumptions on covariance matrices, or do not produce nested solutions. We propose a new sparse CCA (SCCA) method that recasts high-dimensional CCA as an iterative penalized least squares problem. Thanks to the new iterative penalized least squares formulation, our method directly estimates the sparse CCA directions with efficient algorithms. Therefore, in contrast to some existing methods, the new SCCA does not impose any sparsity assumptions on the covariance matrices. The proposed SCCA is also very flexible in the sense that it can be easily combined with properly chosen penalty functions to perform structured variable selection and incorporate prior information. Moreover, our proposal of SCCA produces nested solutions and thus provides great convenient in practice. Theoretical results show that SCCA can consistently estimate the true canonical pairs with an overwhelming probability in ultra-high dimensions. Numerical results also demonstrate the competitive performance of SCCA.
对两组可能具有高度相关性的高维测量数据之间的关系进行建模变得越来越有趣。典型相关分析(CCA)是一种经典工具,用于探索两个多元随机变量之间的依赖性,并提取高度相关的线性组合的典型对。受基因组学、文本挖掘和成像研究等领域应用的推动,最近许多研究将CCA推广到高维环境。然而,它们中的大多数要么依赖于协方差矩阵的强假设,要么不产生嵌套解。我们提出了一种新的稀疏CCA(SCCA)方法,将高维CCA重新表述为一个迭代惩罚最小二乘问题。由于新的迭代惩罚最小二乘公式,我们的方法使用高效算法直接估计稀疏CCA方向。因此,与一些现有方法相比,新的SCCA不对协方差矩阵施加任何稀疏性假设。所提出的SCCA在可以很容易地与适当选择的惩罚函数相结合以执行结构化变量选择并纳入先验信息的意义上也非常灵活。此外,我们提出的SCCA产生嵌套解,因此在实践中提供了极大的便利。理论结果表明,SCCA在超高维中可以以压倒性的概率一致地估计真正的典型对。数值结果也证明了SCCA的竞争性能。