Hsu Ying-Lin, Huang Po-Yu, Chen Dung-Tsa
Department of Applied Mathematics, National Chung Hsing University, Taichung 402, Taiwan.
Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, USA.
Transl Cancer Res. 2014 Jun;3(3):182-190. doi: 10.3978/j.issn.2218-676X.2014.05.06.
A critical challenging component in analyzing high-dimensional data in cancer research is how to reduce the dimension of data and how to extract relevant features. Sparse principal component analysis (PCA) is a powerful statistical tool that could help reduce data dimension and select important variables simultaneously. In this paper, we review several approaches for sparse PCA, including variance maximization (VM), reconstruction error minimization (REM), singular value decomposition (SVD), and probabilistic modeling (PM) approaches. A simulation study is conducted to compare PCA and the sparse PCAs. An example using a published gene signature in a lung cancer dataset is used to illustrate the potential application of sparse PCAs in cancer research.
在癌症研究中分析高维数据时,一个关键的挑战性组成部分是如何降低数据维度以及如何提取相关特征。稀疏主成分分析(PCA)是一种强大的统计工具,它可以帮助同时降低数据维度并选择重要变量。在本文中,我们回顾了几种稀疏PCA方法,包括方差最大化(VM)、重构误差最小化(REM)、奇异值分解(SVD)和概率建模(PM)方法。进行了一项模拟研究以比较PCA和稀疏PCA。使用肺癌数据集中已发表的基因特征的一个例子来说明稀疏PCA在癌症研究中的潜在应用。