Morales-Jimenez David, Johnstone Iain M, McKay Matthew R, Yang Jeha
ECIT Institute, Queen's University Belfast, UK.
Department of Statistics, Stanford University, USA.
Stat Sin. 2021 Apr;31(2):571-601. doi: 10.5705/ss.202019.0052.
Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond "null models", which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio , of number of variables to sample size , converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly. Indeed, the correlation-based fluctuations of both sample eigenvalues and eigenvectors are often remarkably smaller than those of their sample covariance counterparts.
样本相关矩阵被广泛使用,但对于高维数据,除了“零模型”(其假设数据具有独立坐标)之外,人们对其谱性质知之甚少。在尖峰模型类别中,我们应用随机矩阵理论来推导样本相关矩阵的主导特征值和特征向量的渐近一阶和分布结果,假设在高维情况下,变量数量与样本大小的比率收敛到一个正常数。虽然样本相关矩阵的一阶谱性质与样本协方差矩阵的一阶谱性质相匹配,但其渐近分布可能有显著差异。实际上,样本特征值和特征向量基于相关性的波动通常明显小于其样本协方差对应物的波动。