Suppr超能文献

高维环境下主成分得分的收敛性与预测

CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS.

作者信息

Lee Seunggeun, Zou Fei, Wright Fred A

机构信息

University of North Carolina, 3101 McGavran-Greenberg, CB 7420 Chapel Hill, North Carolina 27599.

出版信息

Ann Stat. 2010 Jan 1;38(6):3605-3629. doi: 10.1214/10-AOS821.

Abstract

A number of settings arise in which it is of interest to predict Principal Component (PC) scores for new observations using data from an initial sample. In this paper, we demonstrate that naive approaches to PC score prediction can be substantially biased towards 0 in the analysis of large matrices. This phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors as both dimensions of the matrix increase. For the spiked eigenvalue model for random matrices, we expand the generality of these results, and propose bias-adjusted PC score prediction. In addition, we compute the asymptotic correlation coefficient between PC scores from sample and population eigenvectors. Simulation and real data examples from the genetics literature show the improved bias and numerical properties of our estimators.

摘要

在许多情况下,利用初始样本的数据来预测新观测值的主成分(PC)得分是很有意义的。在本文中,我们证明了在大矩阵分析中,朴素的主成分得分预测方法可能会严重偏向于0。这种现象在很大程度上与样本特征值和特征向量已知的不一致结果有关,因为矩阵的两个维度都会增加。对于随机矩阵的尖峰特征值模型,我们扩展了这些结果的一般性,并提出了偏差调整后的主成分得分预测方法。此外,我们计算了样本特征向量和总体特征向量的主成分得分之间的渐近相关系数。来自遗传学文献的模拟和实际数据示例显示了我们估计量在偏差和数值特性方面的改进。

相似文献

3
Asymptotic Theory of Eigenvectors for Random Matrices with Diverging Spikes.具有发散尖峰的随机矩阵特征向量的渐近理论
J Am Stat Assoc. 2022;117(538):996-1009. doi: 10.1080/01621459.2020.1840990. Epub 2020 Dec 8.
4
PCA in High Dimensions: An orientation.高维主成分分析:一种导向
Proc IEEE Inst Electr Electron Eng. 2018 Aug;106(8):1277-1292. doi: 10.1109/JPROC.2018.2846730. Epub 2018 Jul 18.
5
Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model.尖峰协方差模型中特征值的最优收缩
Ann Stat. 2018 Aug;46(4):1742-1778. doi: 10.1214/17-AOS1601. Epub 2018 Jun 27.
9
Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA).来自特征值稀疏主成分分析(EESPCA)的特征向量。
J Comput Graph Stat. 2022;31(2):486-501. doi: 10.1080/10618600.2021.1987254. Epub 2021 Nov 12.

引用本文的文献

3
Factor analysis of ancient population genomic samples.古代人群基因组样本的因子分析
Nat Commun. 2020 Sep 16;11(1):4661. doi: 10.1038/s41467-020-18335-6.
8
Tree shape-based approaches for the comparative study of cophylogeny.基于树形结构的共系统发育比较研究方法。
Ecol Evol. 2019 May 29;9(12):6756-6771. doi: 10.1002/ece3.5185. eCollection 2019 Jun.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验