Suppr超能文献

来自特征值稀疏主成分分析(EESPCA)的特征向量。

Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA).

作者信息

Frost H Robert

机构信息

Department of Biomedical Data Science, Dartmouth College.

出版信息

J Comput Graph Stat. 2022;31(2):486-501. doi: 10.1080/10618600.2021.1987254. Epub 2021 Nov 12.

Abstract

We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.

摘要

我们提出了一种用于稀疏主成分分析的新技术。这种方法名为基于特征值的特征向量稀疏主成分分析(EESPCA),它基于从全矩阵及其相关子矩阵的特征值计算埃尔米特矩阵平方特征向量载荷的公式。我们探索了EESPCA方法的两个版本:一个使用固定阈值来诱导稀疏性的版本,以及一个通过交叉验证选择阈值的版本。相对于Witten等人、Yuan & Zhang以及Tan等人的最新稀疏主成分分析方法,固定阈值的EESPCA技术在计算速度上有数量级的提升,不需要通过交叉验证来估计调优参数,并且在一系列数据矩阵大小和协方差结构中能够更准确地识别真正为零的主成分载荷。重要的是,EESPCA方法在保持样本外重建误差和主成分估计误差接近所有评估方法产生的最低误差的同时,实现了这些优势。EESPCA是一种实用且有效的稀疏主成分分析技术,特别适用于计算要求高的统计问题,如高维数据集的分析或涉及重复计算稀疏主成分的重采样等统计技术的应用。

相似文献

1
Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA).
J Comput Graph Stat. 2022;31(2):486-501. doi: 10.1080/10618600.2021.1987254. Epub 2021 Nov 12.
2
Large Covariance Estimation by Thresholding Principal Orthogonal Complements.
J R Stat Soc Series B Stat Methodol. 2013 Sep 1;75(4). doi: 10.1111/rssb.12016.
3
Eigenvalue repulsion and eigenvector localization in sparse non-Hermitian random matrices.
Phys Rev E. 2019 Nov;100(5-1):052315. doi: 10.1103/PhysRevE.100.052315.
4
Principal component analysis of dynamic contrast enhanced MRI in human prostate cancer.
Invest Radiol. 2010 Apr;45(4):174-81. doi: 10.1097/RLI.0b013e3181d0a02f.
5
Sparse Principal Component Analysis via Rotation and Truncation.
IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):875-90. doi: 10.1109/TNNLS.2015.2427451. Epub 2015 Dec 22.
6
Strategies for reducing large fMRI data sets for independent component analysis.
Magn Reson Imaging. 2006 Jun;24(5):591-6. doi: 10.1016/j.mri.2005.12.013. Epub 2006 Feb 20.
7
JEDi: java essential dynamics inspector - a molecular trajectory analysis toolkit.
BMC Bioinformatics. 2021 May 1;22(1):226. doi: 10.1186/s12859-021-04140-5.
9
Sparse PCA with Oracle Property.
Adv Neural Inf Process Syst. 2014;2014:1529-1537.
10
PCA in High Dimensions: An orientation.
Proc IEEE Inst Electr Electron Eng. 2018 Aug;106(8):1277-1292. doi: 10.1109/JPROC.2018.2846730. Epub 2018 Jul 18.

引用本文的文献

1
Gene set optimization for cancer transcriptomics using sparse principal component analysis.
bioRxiv. 2025 May 26:2025.05.21.655279. doi: 10.1101/2025.05.21.655279.
3
Integration of whole transcriptome spatial profiling with protein markers.
Nat Biotechnol. 2023 Jun;41(6):788-793. doi: 10.1038/s41587-022-01536-3. Epub 2023 Jan 2.
4
Impact of digital technologies upon teaching and learning in higher education in Latin America: an outlook on the reach, barriers, and bottlenecks.
Educ Inf Technol (Dordr). 2023;28(2):2291-2360. doi: 10.1007/s10639-022-11214-1. Epub 2022 Aug 15.
5
Technology-mediated teaching and learning process: A conceptual study of educators' response amidst the Covid-19 pandemic.
Educ Inf Technol (Dordr). 2021;26(6):7225-7257. doi: 10.1007/s10639-021-10527-x. Epub 2021 May 18.

本文引用的文献

2
Comprehensive Integration of Single-Cell Data.
Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.
3
Scaling single-cell genomics from phenomenology to mechanism.
Nature. 2017 Jan 18;541(7637):331-338. doi: 10.1038/nature21350.
4
Revealing the vectors of cellular identity with single-cell genomics.
Nat Biotechnol. 2016 Nov 8;34(11):1145-1160. doi: 10.1038/nbt.3711.
5
Principal component analysis: a review and recent developments.
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.
6
Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis.
Nat Methods. 2016 Mar;13(3):241-4. doi: 10.1038/nmeth.3734. Epub 2016 Jan 18.
7
limma powers differential expression analyses for RNA-sequencing and microarray studies.
Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.
8
Principal component analysis-based filtering improves detection for Affymetrix gene expression arrays.
Nucleic Acids Res. 2011 Jul;39(13):e86. doi: 10.1093/nar/gkr241. Epub 2011 Apr 27.
9
Principal component analysis based methods in bioinformatics studies.
Brief Bioinform. 2011 Nov;12(6):714-22. doi: 10.1093/bib/bbq090. Epub 2011 Jan 17.
10
On Consistency and Sparsity for Principal Components Analysis in High Dimensions.
J Am Stat Assoc. 2009 Jun 1;104(486):682-693. doi: 10.1198/jasa.2009.0121.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验