稀疏 PCA（研究）的批判性评估：为何（人们应该认识到）权重不是载荷。

A critical assessment of sparse PCA (research): why (one should acknowledge that) weights are not loadings.

机构信息

Tilburg University, Methods and Statistics, Tilburg, The Netherlands.

KU Leuven, Psychology and Educational Sciences, Leuven, Belgium.

出版信息

Behav Res Methods. 2024 Mar;56(3):1413-1432. doi: 10.3758/s13428-023-02099-0. Epub 2023 Aug 1.

DOI:10.3758/s13428-023-02099-0

PMID:37540466

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10991020/

Abstract

Principal component analysis (PCA) is an important tool for analyzing large collections of variables. It functions both as a pre-processing tool to summarize many variables into components and as a method to reveal structure in data. Different coefficients play a central role in these two uses. One focuses on the weights when the goal is summarization, while one inspects the loadings if the goal is to reveal structure. It is well known that the solutions to the two approaches can be found by singular value decomposition; weights, loadings, and right singular vectors are mathematically equivalent. What is often overlooked, is that they are no longer equivalent in the setting of sparse PCA methods which induce zeros either in the weights or the loadings. The lack of awareness for this difference has led to questionable research practices in sparse PCA. First, in simulation studies data is generated mostly based only on structures with sparse singular vectors or sparse loadings, neglecting the structure with sparse weights. Second, reported results represent local optima as the iterative routines are often initiated with the right singular vectors. In this paper we critically re-assess sparse PCA methods by also including data generating schemes characterized by sparse weights and different initialization strategies. The results show that relying on commonly used data generating models can lead to over-optimistic conclusions. They also highlight the impact of choice between sparse weights versus sparse loadings methods and the initialization strategies. The practical consequences of this choice are illustrated with empirical datasets.

摘要

主成分分析（PCA）是分析大量变量的重要工具。它既是一种将许多变量总结为成分的预处理工具，也是一种揭示数据结构的方法。不同的系数在这两种用途中起着核心作用。一种关注的是目标是总结时的权重，而另一种则在目标是揭示结构时检查加载。众所周知，这两种方法的解可以通过奇异值分解来找到；权重、加载和右奇异向量在数学上是等效的。常常被忽视的是，在诱导权重或加载中的零的稀疏 PCA 方法中，它们不再等效。对这种差异缺乏认识导致了稀疏 PCA 中的可疑研究实践。首先，在模拟研究中，数据主要是基于稀疏奇异向量或稀疏加载的结构生成的，而忽略了具有稀疏权重的结构。其次，报告的结果代表局部最优，因为迭代例程通常是从右奇异向量开始的。在本文中，我们通过还包括具有稀疏权重和不同初始化策略的数据生成方案来批判性地重新评估稀疏 PCA 方法。结果表明，依赖常用的数据生成模型可能会导致过于乐观的结论。它们还强调了在稀疏权重与稀疏加载方法和初始化策略之间进行选择的影响。通过实证数据集说明了这种选择的实际后果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfc0/10991020/2d7bf778d84f/13428_2023_2099_Figa_HTML.jpg

相似文献

A critical assessment of sparse PCA (research): why (one should acknowledge that) weights are not loadings.稀疏 PCA（研究）的批判性评估：为何（人们应该认识到）权重不是载荷。

Behav Res Methods. 2024 Mar;56(3):1413-1432. doi: 10.3758/s13428-023-02099-0. Epub 2023 Aug 1.

A Guide for Sparse PCA: Model Comparison and Applications.稀疏 PCA 指南：模型比较与应用。

Psychometrika. 2021 Dec;86(4):893-919. doi: 10.1007/s11336-021-09773-2. Epub 2021 Jun 29.

Super-sparse principal component analyses for high-throughput genomic data.超高通量基因组数据的超稀疏主成分分析。

BMC Bioinformatics. 2010 Jun 2;11:296. doi: 10.1186/1471-2105-11-296.

Incorporating biological information in sparse principal component analysis with application to genomic data.将生物信息纳入稀疏主成分分析并应用于基因组数据。

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

Edge-group sparse PCA for network-guided high dimensional data analysis.基于边缘群稀疏 PCA 的网络引导高维数据分析。

Bioinformatics. 2018 Oct 15;34(20):3479-3487. doi: 10.1093/bioinformatics/bty362.

Sparse principal component analysis in cancer research.癌症研究中的稀疏主成分分析

Transl Cancer Res. 2014 Jun;3(3):182-190. doi: 10.3978/j.issn.2218-676X.2014.05.06.

A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes on RNA-Seq Data.一种基于类别信息的稀疏成分分析方法，用于识别RNA测序数据上的差异表达基因。

IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):392-8. doi: 10.1109/TCBB.2015.2440265.

Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data.基于多视图生物数据的共特征基因选择和肿瘤分类的有监督判别稀疏 PCA

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2926-2937. doi: 10.1109/TNNLS.2019.2893190. Epub 2019 Feb 22.

Biclustering via sparse singular value decomposition.基于稀疏奇异值分解的双聚类

Biometrics. 2010 Dec;66(4):1087-95. doi: 10.1111/j.1541-0420.2010.01392.x.

Sparse Versus Simple Structure Loadings.稀疏与简单结构载荷

Psychometrika. 2015 Sep;80(3):776-90. doi: 10.1007/s11336-014-9416-y. Epub 2014 Aug 1.

本文引用的文献

An approach to structural equation modeling with both factors and components: Integrated generalized structured component analysis.一种同时包含因子和成分的结构方程模型方法：综合广义结构成分分析。

Psychol Methods. 2021 Jun;26(3):273-294. doi: 10.1037/met0000336. Epub 2020 Jul 16.

Revealing the Joint Mechanisms in Traditional Data Linked With Big Data.揭示传统数据与大数据中的联合机制。

Z Psychol. 2018;226(4):212-231. doi: 10.1027/2151-2604/a000341. Epub 2019 Feb 22.

Asymptotics of empirical eigenstructure for high dimensional spiked covariance.高维尖峰协方差的经验特征结构渐近性

Ann Stat. 2017 Jun;45(3):1342-1374. doi: 10.1214/16-AOS1487. Epub 2017 Jun 13.

Principal component analysis: a review and recent developments.主成分分析：综述与最新进展

Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.

Sparse Principal Component Analysis via Rotation and Truncation.基于旋转和截断的稀疏主成分分析。

IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):875-90. doi: 10.1109/TNNLS.2015.2427451. Epub 2015 Dec 22.

Some Relationships Between Descriptive Comparisons of Components from Different Studies.来自不同研究的成分描述性比较之间的一些关系。

Multivariate Behav Res. 1986 Jan 1;21(1):29-40. doi: 10.1207/s15327906mbr2101_2.

Joint Group Sparse PCA for Compressed Hyperspectral Imaging.联合组稀疏 PCA 用于压缩高光谱成像。

IEEE Trans Image Process. 2015 Dec;24(12):4934-42. doi: 10.1109/TIP.2015.2472280. Epub 2015 Aug 24.

Principal Component Analysis With Sparse Fused Loadings.具有稀疏融合载荷的主成分分析

J Comput Graph Stat. 2010;19(4):930-946. doi: 10.1198/jcgs.2010.08127.

Pathway-based association study of multiple candidate genes and multiple traits using structural equation models.使用结构方程模型对多个候选基因和多个性状进行基于通路的关联研究。

Genet Epidemiol. 2015 Feb;39(2):101-13. doi: 10.1002/gepi.21872. Epub 2014 Dec 30.

Sparse Versus Simple Structure Loadings.稀疏与简单结构载荷

Psychometrika. 2015 Sep;80(3):776-90. doi: 10.1007/s11336-014-9416-y. Epub 2014 Aug 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

稀疏 PCA（研究）的批判性评估：为何（人们应该认识到）权重不是载荷。

A critical assessment of sparse PCA (research): why (one should acknowledge that) weights are not loadings.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献