Suppr超能文献

高维小样本渐近性的统计学与数学

The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.

作者信息

Shen Dan, Shen Haipeng, Zhu Hongtu, Marron J S

机构信息

University of South Florida.

University of Hong Kong.

出版信息

Stat Sin. 2016 Oct;26(4):1747-1770. doi: 10.5705/ss.202015.0088.

Abstract

The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

摘要

本文旨在建立多分量尖峰协方差模型主成分分析的若干深层理论性质。我们的新结果揭示了在具有可区分(或不可区分)特征值的尖峰模型下,当样本量和/或变量数量(或维度)趋于无穷大时,临界样本特征方向上的渐近锥形结构。样本特征向量相对于其总体对应向量的一致性由维度与样本量和尖峰大小乘积的比率决定。当该比率收敛到一个非零常数时,样本特征向量收敛到一个锥体,与相应的总体特征向量成一定角度。在高维小样本量的情况下,样本特征向量与其总体对应向量之间的角度收敛到一个极限分布。我们还探讨了多尖峰协方差模型的几种推广,并给出了额外的理论结果。

相似文献

1
The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.
Stat Sin. 2016 Oct;26(4):1747-1770. doi: 10.5705/ss.202015.0088.
3
Asymptotics of empirical eigenstructure for high dimensional spiked covariance.
Ann Stat. 2017 Jun;45(3):1342-1374. doi: 10.1214/16-AOS1487. Epub 2017 Jun 13.
4
James-Stein for the leading eigenvector.
Proc Natl Acad Sci U S A. 2023 Jan 10;120(2):e2207046120. doi: 10.1073/pnas.2207046120. Epub 2023 Jan 5.
5
PCA in High Dimensions: An orientation.
Proc IEEE Inst Electr Electron Eng. 2018 Aug;106(8):1277-1292. doi: 10.1109/JPROC.2018.2846730. Epub 2018 Jul 18.
7
MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA.
Ann Stat. 2013 Jun;41(3):1055-1084. doi: 10.1214/12-AOS1014.
8
Convergence of Eigenvector Continuation.
Phys Rev Lett. 2021 Jan 22;126(3):032501. doi: 10.1103/PhysRevLett.126.032501.
10
On singular values of large dimensional lag- sample auto-correlation matrices.
J Multivar Anal. 2023 Sep;197. doi: 10.1016/j.jmva.2023.105205. Epub 2023 Jun 1.

引用本文的文献

1
A Model-free Variable Screening Method Based on Leverage Score.
J Am Stat Assoc. 2023;118(541):135-146. doi: 10.1080/01621459.2021.1918554. Epub 2021 Jun 21.
2
High-dimensional principal component analysis with heterogeneous missingness.
J R Stat Soc Series B Stat Methodol. 2022 Nov;84(5):2000-2031. doi: 10.1111/rssb.12550. Epub 2022 Nov 20.
3
Information criteria for latent factor models: a study on factor pervasiveness and adaptivity.
J Econom. 2023 Mar;233(1):237-250. doi: 10.1016/j.jeconom.2022.03.005. Epub 2022 Apr 21.
4
James-Stein for the leading eigenvector.
Proc Natl Acad Sci U S A. 2023 Jan 10;120(2):e2207046120. doi: 10.1073/pnas.2207046120. Epub 2023 Jan 5.
5
A Guide for Sparse PCA: Model Comparison and Applications.
Psychometrika. 2021 Dec;86(4):893-919. doi: 10.1007/s11336-021-09773-2. Epub 2021 Jun 29.
6
FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control.
J Am Stat Assoc. 2019;114(528):1880-1893. doi: 10.1080/01621459.2018.1527700. Epub 2019 Mar 20.
7
Factor-Adjusted Regularized Model Selection.
J Econom. 2020 May;216(1):71-85. doi: 10.1016/j.jeconom.2020.01.006. Epub 2020 Feb 7.
8
Distributed estimation of principal eigenspaces.
Ann Stat. 2019 Dec;47(6):3009-3031. doi: 10.1214/18-AOS1713. Epub 2019 Oct 31.
9
PCA in High Dimensions: An orientation.
Proc IEEE Inst Electr Electron Eng. 2018 Aug;106(8):1277-1292. doi: 10.1109/JPROC.2018.2846730. Epub 2018 Jul 18.
10
A survey of high dimension low sample size asymptotics.
Aust N Z J Stat. 2018 Mar;60(1):4-19. doi: 10.1111/anzs.12212. Epub 2018 Mar 14.

本文引用的文献

1
Distributions of Angles in Random Packing on Spheres.
J Mach Learn Res. 2013 Jan;14(1):1837-1864.
2
Overview of object oriented data analysis.
Biom J. 2014 Sep;56(5):732-53. doi: 10.1002/bimj.201300072. Epub 2014 Jan 13.
3
CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS.
Ann Stat. 2010 Jan 1;38(6):3605-3629. doi: 10.1214/10-AOS821.
4
On Consistency and Sparsity for Principal Components Analysis in High Dimensions.
J Am Stat Assoc. 2009 Jun 1;104(486):682-693. doi: 10.1198/jasa.2009.0121.
5
SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
PLoS One. 2010 Mar 26;5(3):e9905. doi: 10.1371/journal.pone.0009905.
6
Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.
J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.
7
RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing.
Methods. 2009 Jul;48(3):249-57. doi: 10.1016/j.ymeth.2009.03.016. Epub 2009 Mar 29.
8
Annotating genomes with massive-scale RNA sequencing.
Genome Biol. 2008;9(12):R175. doi: 10.1186/gb-2008-9-12-r175. Epub 2008 Dec 16.
9
The incredible shrinking world of DNA microarrays.
Mol Biosyst. 2008 Jul;4(7):726-32. doi: 10.1039/b706237k. Epub 2008 Apr 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验