Lee Seunggeun, Zou Fei, Wright Fred A
University of North Carolina, 3101 McGavran-Greenberg, CB 7420 Chapel Hill, North Carolina 27599.
Ann Stat. 2010 Jan 1;38(6):3605-3629. doi: 10.1214/10-AOS821.
A number of settings arise in which it is of interest to predict Principal Component (PC) scores for new observations using data from an initial sample. In this paper, we demonstrate that naive approaches to PC score prediction can be substantially biased towards 0 in the analysis of large matrices. This phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors as both dimensions of the matrix increase. For the spiked eigenvalue model for random matrices, we expand the generality of these results, and propose bias-adjusted PC score prediction. In addition, we compute the asymptotic correlation coefficient between PC scores from sample and population eigenvectors. Simulation and real data examples from the genetics literature show the improved bias and numerical properties of our estimators.
在许多情况下,利用初始样本的数据来预测新观测值的主成分(PC)得分是很有意义的。在本文中,我们证明了在大矩阵分析中,朴素的主成分得分预测方法可能会严重偏向于0。这种现象在很大程度上与样本特征值和特征向量已知的不一致结果有关,因为矩阵的两个维度都会增加。对于随机矩阵的尖峰特征值模型,我们扩展了这些结果的一般性,并提出了偏差调整后的主成分得分预测方法。此外,我们计算了样本特征向量和总体特征向量的主成分得分之间的渐近相关系数。来自遗传学文献的模拟和实际数据示例显示了我们估计量在偏差和数值特性方面的改进。