Laha Nilanjana, Mukherjee Rajarshi
Department of Statistics, Texas A&M University, College Station, TX 77843.
Department of Biostatistics, Harvard T. H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115.
IEEE Trans Inf Theory. 2023 Mar;69(3):1695-1738. doi: 10.1109/tit.2022.3214201. Epub 2022 Oct 17.
In this paper, we consider asymptotically exact support recovery in the context of high dimensional and sparse Canonical Correlation Analysis (CCA). Our main results describe four regimes of interest based on information theoretic and computational considerations. In regimes of "low" sparsity we describe a simple, general, and computationally easy method for support recovery, whereas in a regime of "high" sparsity, it turns out that support recovery is information theoretically impossible. For the sake of information theoretic lower bounds, our results also demonstrate a non-trivial requirement on the "minimal" size of the nonzero elements of the canonical vectors that is required for asymptotically consistent support recovery. Subsequently, the regime of "moderate" sparsity is further divided into two subregimes. In the lower of the two sparsity regimes, we show that polynomial time support recovery is possible by using a sharp analysis of a co-ordinate thresholding [1] type method. In contrast, in the higher end of the moderate sparsity regime, appealing to the "Low Degree Polynomial" Conjecture [2], we provide evidence that polynomial time support recovery methods are inconsistent. Finally, we carry out numerical experiments to compare the efficacy of various methods discussed.
在本文中,我们考虑高维稀疏典型相关分析(CCA)背景下的渐近精确支持恢复。我们的主要结果基于信息论和计算方面的考虑描述了四种感兴趣的情况。在“低”稀疏度情况下,我们描述了一种简单、通用且计算简便的支持恢复方法,而在“高”稀疏度情况下,事实证明从信息论角度来看支持恢复是不可能的。为了得到信息论下界,我们的结果还表明了对于渐近一致支持恢复所需的典型向量非零元素“最小”规模的一个重要要求。随后,“中等”稀疏度情况进一步分为两个子情况。在两个稀疏度情况中较低的那个情况下,我们表明通过对一种坐标阈值化[1]类型方法进行精确分析,多项式时间支持恢复是可行的。相比之下,在中等稀疏度情况的较高端,借助“低次多项式”猜想[2],我们提供证据表明多项式时间支持恢复方法是不一致的。最后,我们进行数值实验以比较所讨论的各种方法的有效性。