Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA, USA.
Biotechniques. 2013 Mar;54(3):165-8. doi: 10.2144/000113978.
Principal Component Analysis (PCA) is a common exploratory tool used to evaluate large complex data sets. The resulting lower-dimensional representations are often valuable for pattern visualization, clustering, or classification of the data. However, PCA cannot be applied directly to many -omics data sets generated by newer technologies such as label-free mass spectrometry due to large numbers of non-random missing values. Here we present a sequential projection pursuit PCA (sppPCA) method for defining principal components in the presence of missing data. Our results demonstrate that this approach generates robust and informative low-dimensional data representations compared to commonly used imputation approaches.
主成分分析(PCA)是一种常用的探索性工具,用于评估大型复杂数据集。得到的低维表示通常对于数据的模式可视化、聚类或分类很有价值。然而,由于新技术(如无标记质谱)产生的许多组学数据集中存在大量非随机缺失值,因此不能直接应用 PCA 方法。在这里,我们提出了一种序贯投影寻踪 PCA(sppPCA)方法,用于在存在缺失数据的情况下定义主成分。我们的结果表明,与常用的插补方法相比,该方法生成了稳健且信息量丰富的低维数据表示。