主成分分析：综述与最新进展

Principal component analysis: a review and recent developments.

作者信息

Jolliffe Ian T, Cadima Jorge

机构信息

College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK.

Secção de Matemática (DCEB), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa 1340-017, Portugal Centro de Estatística e Aplicações da Universidade de Lisboa (CEAUL), Lisboa, Portugal

出版信息

Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.

DOI:10.1098/rsta.2015.0202

PMID:26953178

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4792409/

Abstract

Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

摘要

大型数据集越来越普遍，且往往难以解读。主成分分析（PCA）是一种用于降低此类数据集维度的技术，它在增加可解释性的同时，将信息损失降至最低。它通过创建新的不相关变量来实现这一点，这些变量会依次最大化方差。找到这些新变量，即主成分，归结为求解一个特征值/特征向量问题，并且新变量由手头的数据集定义，而非先验确定，因此PCA成为一种自适应数据分析技术。它在另一种意义上也是自适应的，因为已经开发出了该技术的变体，以适应各种不同的数据类型和结构。本文将首先介绍PCA的基本思想，讨论其能做什么和不能做什么。然后将描述PCA的一些变体及其应用。

相似文献

Principal component analysis: a review and recent developments.主成分分析：综述与最新进展

Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.

Principal component analysis of texture features derived from FDG PET images of melanoma lesions.黑色素瘤病灶的FDG PET图像衍生纹理特征的主成分分析

EJNMMI Phys. 2022 Sep 15;9(1):64. doi: 10.1186/s40658-022-00491-x.

Principal component analysis of dynamic contrast enhanced MRI in human prostate cancer.基于动态对比增强磁共振成像的人类前列腺癌主成分分析。

Invest Radiol. 2010 Apr;45(4):174-81. doi: 10.1097/RLI.0b013e3181d0a02f.

Adaptive dimensionality reduction for neural network-based online principal component analysis.基于神经网络的在线主成分分析的自适应降维

PLoS One. 2021 Mar 30;16(3):e0248896. doi: 10.1371/journal.pone.0248896. eCollection 2021.

Eigenanatomy: sparse dimensionality reduction for multi-modal medical image analysis.本征解剖学：用于多模态医学图像分析的稀疏降维方法

Methods. 2015 Feb;73:43-53. doi: 10.1016/j.ymeth.2014.10.016. Epub 2014 Oct 22.

Improved Interpretability of Brain-Behavior CCA With Domain-Driven Dimension Reduction.通过领域驱动的降维提高脑行为典型相关分析的可解释性

Front Neurosci. 2022 Jun 23;16:851827. doi: 10.3389/fnins.2022.851827. eCollection 2022.

Multivariate methods for the analysis of complex and big data in forensic sciences. Application to age estimation in living persons.法医学中复杂大数据分析的多变量方法。在活体年龄估计中的应用。

Forensic Sci Int. 2016 Sep;266:581.e1-581.e9. doi: 10.1016/j.forsciint.2016.05.014. Epub 2016 May 21.

Penalized Principal Component Analysis Using Smoothing.使用平滑技术的惩罚主成分分析

ArXiv. 2025 Mar 3:arXiv:2309.13838v2.

Comparing patterns of component loadings: principal component analysis (PCA) versus independent component analysis (ICA) in analyzing multivariate non-normal data.比较成分载荷模式：主成分分析（PCA）与独立成分分析（ICA）在分析多元非正态数据中的应用。

Behav Res Methods. 2012 Dec;44(4):1239-43. doi: 10.3758/s13428-012-0193-1.

Stochastic convex sparse principal component analysis.随机凸稀疏主成分分析

EURASIP J Bioinform Syst Biol. 2016 Sep 9;2016(1):15. doi: 10.1186/s13637-016-0045-x. eCollection 2016 Dec.

引用本文的文献

Comparative analysis of stress levels and coping strategies in parents of neurodivergent and neurotypical children.神经发育异常儿童与神经发育正常儿童的父母的压力水平及应对策略的比较分析

Front Child Adolesc Psychiatry. 2025 Aug 22;4:1619993. doi: 10.3389/frcha.2025.1619993. eCollection 2025.

High-Speed Atomic Force Microscopy Reveals the Dynamic Interplay of Membrane Proteins is Lipid-Modulated.高速原子力显微镜揭示膜蛋白的动态相互作用受脂质调节。

Small Sci. 2025 Jul 8;5(9):2500258. doi: 10.1002/smsc.202500258. eCollection 2025 Sep.

A repetitive amplitude encoding method for enhancing the mapping ability of quantum neural networks.一种用于增强量子神经网络映射能力的重复幅度编码方法。

Sci Rep. 2025 Sep 1;15(1):32111. doi: 10.1038/s41598-025-17651-5.

A transformer-based embedding approach to developing short-form psychological measures.一种基于变压器的嵌入方法，用于开发简短形式的心理测量工具。

Front Psychol. 2025 Aug 13;16:1640864. doi: 10.3389/fpsyg.2025.1640864. eCollection 2025.

Comprehensive Analysis of Gastrointestinal Injury Induced by Nonsteroidal Anti-Inflammatory Drugs Using Data from FDA Adverse Event Reporting System Database.使用美国食品药品监督管理局不良事件报告系统数据库的数据对非甾体抗炎药所致胃肠道损伤进行综合分析

Pharmaceuticals (Basel). 2025 Aug 14;18(8):1204. doi: 10.3390/ph18081204.

Polyphenolic Profile and Biological Activities in HT29 Intestinal Epithelial Cells of Fruit Extract.水果提取物在HT29肠上皮细胞中的多酚概况及生物活性

Int J Mol Sci. 2025 Aug 14;26(16):7851. doi: 10.3390/ijms26167851.

Initial Development and Psychometric Validation of the Self-Efficacy Scale for Informational Reading Strategies in Teacher Candidates.职前教师信息阅读策略自我效能量表的初步编制与心理测量学验证

Behav Sci (Basel). 2025 Jul 23;15(8):1002. doi: 10.3390/bs15081002.

PDMS Membranes Drilled by Proton Microbeam Writing: A Customizable Platform for the Investigation of Endothelial Cell-Substrate Interactions in Transwell-like Devices.通过质子微束写入技术钻孔的聚二甲基硅氧烷（PDMS）膜：一种用于在类似Transwell装置中研究内皮细胞与基质相互作用的可定制平台。

J Funct Biomater. 2025 Jul 28;16(8):274. doi: 10.3390/jfb16080274.

Machine Learning-Driven Insights in Cancer Metabolomics: From Subtyping to Biomarker Discovery and Prognostic Modeling.机器学习驱动的癌症代谢组学见解：从亚型分类到生物标志物发现与预后建模

Metabolites. 2025 Aug 1;15(8):514. doi: 10.3390/metabo15080514.

A New Method for Dynamic Brain Connectivity Analysis Based on Tensor Decomposition in Tinnitus Using High-density Electroencephalogram in Source Domain.一种基于源域高密度脑电图张量分解的耳鸣动态脑连接分析新方法。

J Med Signals Sens. 2025 Aug 6;15:23. doi: 10.4103/jmss.jmss_75_24. eCollection 2025.

本文引用的文献

MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA.含噪声高维数据的稀疏主成分分析的极小极大界

Ann Stat. 2013 Jun;41(3):1055-1084. doi: 10.1214/12-AOS1014.

Dietary specializations and diversity in feeding ecology of the earliest stem mammals.最早的哺乳动物的饮食专业化和摄食生态多样性。

Nature. 2014 Aug 21;512(7514):303-5. doi: 10.1038/nature13622.

Selecting the Number of Principal Components in Functional Data.功能数据中主成分数量的选择

J Am Stat Assoc. 2013 Dec 19;108(504). doi: 10.1080/01621459.2013.788980.

On Consistency and Sparsity for Principal Components Analysis in High Dimensions.高维主成分分析中的一致性与稀疏性

J Am Stat Assoc. 2009 Jun 1;104(486):682-693. doi: 10.1198/jasa.2009.0121.

Super-sparse principal component analyses for high-throughput genomic data.超高通量基因组数据的超稀疏主成分分析。

BMC Bioinformatics. 2010 Jun 2;11:296. doi: 10.1186/1471-2105-11-296.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

What is principal component analysis?什么是主成分分析？

Nat Biotechnol. 2008 Mar;26(3):303-4. doi: 10.1038/nbt0308-303.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验