Han Fang, Liu Han
Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA.
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA.
Bernoulli (Andover). 2017 Feb;23(1):23-57. doi: 10.3150/15-BEJ702. Epub 2016 Sep 27.
Correlation matrix plays a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson's sample correlation matrix. Although Pearson's sample correlation matrix enjoys various good properties under Gaussian models, its not an effective estimator when facing heavy-tail distributions with possible outliers. As a robust alternative, Han and Liu (2013b) advocated the use of a transformed version of the Kendall's tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall's tau sample correlation matrix and its transformed version proposed in Han and Liu (2013b) for estimating the population Kendall's tau correlation matrix and the latent Pearson's correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of "effective rank" in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a "sign subgaussian condition" which is sufficient to guarantee that the rank-based correlation matrix estimator attains the optimal rate of convergence. In both cases, we do not need any moment condition.
相关矩阵在许多多元方法(如图形模型估计和因子分析)中起着关键作用。当前估计大型相关矩阵的最新技术主要集中在使用皮尔逊样本相关矩阵。尽管皮尔逊样本相关矩阵在高斯模型下具有各种良好性质,但在面对可能存在异常值的重尾分布时,它并不是一个有效的估计器。作为一种稳健的替代方法,韩和刘(2013b)主张在估计超椭圆分布族(或椭圆联结函数)下的高维潜在广义相关矩阵时,使用肯德尔秩相关系数样本相关矩阵的变换版本。超椭圆分布族假设在进行未指定的边际单调变换后,数据服从椭圆分布。在本文中,我们研究了肯德尔秩相关系数样本相关矩阵及其在韩和刘(2013b)中提出的变换版本在谱范数和受限谱范数下估计总体肯德尔秩相关系数矩阵和潜在皮尔逊相关矩阵的理论性质。关于谱范数,我们强调了“有效秩”在量化收敛速度方面的作用。关于受限谱范数,我们首次提出了一个“符号次高斯条件”,该条件足以保证基于秩的相关矩阵估计器达到最优收敛速度。在这两种情况下,我们都不需要任何矩条件。