• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

稀疏 PCA 指南:模型比较与应用。

A Guide for Sparse PCA: Model Comparison and Applications.

机构信息

Department of Methodology and Statistics, Tilburg University, Prof. Cobbenhagenlaan 225, Simon Building, Room S 820, 5037 DB , Tilburg, The Netherlands.

Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands.

出版信息

Psychometrika. 2021 Dec;86(4):893-919. doi: 10.1007/s11336-021-09773-2. Epub 2021 Jun 29.

DOI:10.1007/s11336-021-09773-2
PMID:34185214
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8636462/
Abstract

PCA is a popular tool for exploring and summarizing multivariate data, especially those consisting of many variables. PCA, however, is often not simple to interpret, as the components are a linear combination of the variables. To address this issue, numerous methods have been proposed to sparsify the nonzero coefficients in the components, including rotation-thresholding methods and, more recently, PCA methods subject to sparsity inducing penalties or constraints. Here, we offer guidelines on how to choose among the different sparse PCA methods. Current literature misses clear guidance on the properties and performance of the different sparse PCA methods, often relying on the misconception that the equivalence of the formulations for ordinary PCA also holds for sparse PCA. To guide potential users of sparse PCA methods, we first discuss several popular sparse PCA methods in terms of where the sparseness is imposed on the loadings or on the weights, assumed model, and optimization criterion used to impose sparseness. Second, using an extensive simulation study, we assess each of these methods by means of performance measures such as squared relative error, misidentification rate, and percentage of explained variance for several data generating models and conditions for the population model. Finally, two examples using empirical data are considered.

摘要

主成分分析(PCA)是探索和总结多元数据的常用工具,特别是那些由许多变量组成的数据。然而,PCA 通常不容易解释,因为组件是变量的线性组合。为了解决这个问题,已经提出了许多方法来稀疏化组件中的非零系数,包括旋转阈值方法和最近的稀疏 PCA 方法,这些方法受到稀疏诱导惩罚或约束的影响。在这里,我们提供了如何在不同的稀疏 PCA 方法之间进行选择的指导原则。目前的文献缺乏对不同稀疏 PCA 方法的性质和性能的明确指导,通常依赖于这样一种误解,即普通 PCA 的公式等价也适用于稀疏 PCA。为了指导潜在的稀疏 PCA 方法用户,我们首先根据稀疏性是施加在加载项还是权重上、所假设的模型以及用于施加稀疏性的优化标准,讨论几种流行的稀疏 PCA 方法。其次,我们使用广泛的模拟研究,通过平方相对误差、误识别率和几个数据生成模型的解释方差百分比等性能指标来评估这些方法的性能,以及总体模型的条件。最后,考虑了两个使用经验数据的示例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/cca2b10275b2/11336_2021_9773_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/84952aaba980/11336_2021_9773_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/7c7a80247e7a/11336_2021_9773_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/33d927864580/11336_2021_9773_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/1e002572fd6f/11336_2021_9773_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/db325031e98b/11336_2021_9773_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/e723f6fb1bf5/11336_2021_9773_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/9eec016d96cc/11336_2021_9773_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/2ebeb205aeec/11336_2021_9773_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/cca2b10275b2/11336_2021_9773_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/84952aaba980/11336_2021_9773_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/7c7a80247e7a/11336_2021_9773_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/33d927864580/11336_2021_9773_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/1e002572fd6f/11336_2021_9773_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/db325031e98b/11336_2021_9773_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/e723f6fb1bf5/11336_2021_9773_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/9eec016d96cc/11336_2021_9773_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/2ebeb205aeec/11336_2021_9773_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dfd/8636462/cca2b10275b2/11336_2021_9773_Fig9_HTML.jpg

相似文献

1
A Guide for Sparse PCA: Model Comparison and Applications.稀疏 PCA 指南:模型比较与应用。
Psychometrika. 2021 Dec;86(4):893-919. doi: 10.1007/s11336-021-09773-2. Epub 2021 Jun 29.
2
A critical assessment of sparse PCA (research): why (one should acknowledge that) weights are not loadings.稀疏 PCA(研究)的批判性评估:为何(人们应该认识到)权重不是载荷。
Behav Res Methods. 2024 Mar;56(3):1413-1432. doi: 10.3758/s13428-023-02099-0. Epub 2023 Aug 1.
3
Sparse Principal Component Analysis via Rotation and Truncation.基于旋转和截断的稀疏主成分分析。
IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):875-90. doi: 10.1109/TNNLS.2015.2427451. Epub 2015 Dec 22.
4
Structured Sparse Principal Components Analysis With the TV-Elastic Net Penalty.基于 TV-弹性网络罚项的结构稀疏主成分分析。
IEEE Trans Med Imaging. 2018 Feb;37(2):396-407. doi: 10.1109/TMI.2017.2749140. Epub 2017 Sep 4.
5
Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data.基于图拉普拉斯算子和双稀疏约束的主成分分析用于多视图数据的特征选择和样本聚类
Hum Hered. 2019;84(1):47-58. doi: 10.1159/000501653. Epub 2019 Aug 29.
6
Multilinear sparse principal component analysis.多元稀疏主成分分析。
IEEE Trans Neural Netw Learn Syst. 2014 Oct;25(10):1942-50. doi: 10.1109/TNNLS.2013.2297381.
7
Sparse Principal Component Analysis With Preserved Sparsity Pattern.具有保留稀疏模式的稀疏主成分分析
IEEE Trans Image Process. 2019 Jul;28(7):3274-3285. doi: 10.1109/TIP.2019.2895464. Epub 2019 Jan 25.
8
Incorporating biological information in sparse principal component analysis with application to genomic data.将生物信息纳入稀疏主成分分析并应用于基因组数据。
BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.
9
Stochastic convex sparse principal component analysis.随机凸稀疏主成分分析
EURASIP J Bioinform Syst Biol. 2016 Sep 9;2016(1):15. doi: 10.1186/s13637-016-0045-x. eCollection 2016 Dec.
10
Super-sparse principal component analyses for high-throughput genomic data.超高通量基因组数据的超稀疏主成分分析。
BMC Bioinformatics. 2010 Jun 2;11:296. doi: 10.1186/1471-2105-11-296.

引用本文的文献

1
Semisynthetic simulation for microbiome data analysis.用于微生物组数据分析的半合成模拟
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf051.
2
Comparative evaluation of feature reduction methods for drug response prediction.用于药物反应预测的特征约简方法的比较评估。
Sci Rep. 2024 Dec 28;14(1):30885. doi: 10.1038/s41598-024-81866-1.
3
Topological data analysis expands the genotype to phenotype map for 3D maize root system architecture.拓扑数据分析扩展了三维玉米根系结构从基因型到表型的映射。

本文引用的文献

1
Variable Selection in the Regularized Simultaneous Component Analysis Method for Multi-Source Data Integration.正则化同步成分分析方法在多源数据整合中的变量选择。
Sci Rep. 2019 Dec 9;9(1):18608. doi: 10.1038/s41598-019-54673-2.
2
RegularizedSCA: Regularized simultaneous component analysis of multiblock data in R.RegularizedSCA:R 中的多块数据正则化同时成分分析。
Behav Res Methods. 2019 Oct;51(5):2268-2289. doi: 10.3758/s13428-018-1163-z.
3
The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.高维小样本渐近性的统计学与数学
Front Plant Sci. 2024 Jan 15;14:1260005. doi: 10.3389/fpls.2023.1260005. eCollection 2023.
4
Volatile Markers for Cancer in Exhaled Breath-Could They Be the Signature of the Gut Microbiota?呼气中癌症的挥发性标志物——它们可能是肠道微生物组的特征吗?
Molecules. 2023 Apr 15;28(8):3488. doi: 10.3390/molecules28083488.
5
Logistic regression with sparse common and distinctive covariates.基于稀疏共同和独特协变量的逻辑回归。
Behav Res Methods. 2023 Dec;55(8):4143-4174. doi: 10.3758/s13428-022-02011-2. Epub 2023 Feb 13.
6
Cellular miR-6741-5p as a Prognostic Biomarker Predicting Length of Hospital Stay among COVID-19 Patients.细胞 miR-6741-5p 作为 COVID-19 患者住院时间的预后生物标志物。
Viruses. 2022 Nov 30;14(12):2681. doi: 10.3390/v14122681.
7
Simultaneous clustering and variable selection: A novel algorithm and model selection procedure.同时聚类和变量选择:一种新算法和模型选择过程。
Behav Res Methods. 2023 Aug;55(5):2157-2174. doi: 10.3758/s13428-022-01795-7. Epub 2022 Sep 9.
Stat Sin. 2016 Oct;26(4):1747-1770. doi: 10.5705/ss.202015.0088.
4
Some Relationships Between Descriptive Comparisons of Components from Different Studies.来自不同研究的成分描述性比较之间的一些关系。
Multivariate Behav Res. 1986 Jan 1;21(1):29-40. doi: 10.1207/s15327906mbr2101_2.
5
Sparse Versus Simple Structure Loadings.稀疏与简单结构载荷
Psychometrika. 2015 Sep;80(3):776-90. doi: 10.1007/s11336-014-9416-y. Epub 2014 Aug 1.
6
On Consistency and Sparsity for Principal Components Analysis in High Dimensions.高维主成分分析中的一致性与稀疏性
J Am Stat Assoc. 2009 Jun 1;104(486):682-693. doi: 10.1198/jasa.2009.0121.
7
Genome-wide expression profiling of lymphoblastoid cell lines distinguishes different forms of autism and reveals shared pathways.淋巴母细胞系的全基因组表达谱分析可区分不同形式的自闭症并揭示共同通路。
Hum Mol Genet. 2007 Jul 15;16(14):1682-98. doi: 10.1093/hmg/ddm116. Epub 2007 May 21.
8
The NEO-PI-3: a more readable revised NEO Personality Inventory.新版NEO人格问卷-3:一份可读性更强的修订版NEO人格量表。
J Pers Assess. 2005 Jun;84(3):261-70. doi: 10.1207/s15327752jpa8403_05.