Trace Analysis Research Centre, Department of Chemistry , Dalhousie University , P.O. Box 15000, Halifax , Nova Scotia B3H 4R2 , Canada.
Anal Chem. 2020 Jan 21;92(2):1755-1762. doi: 10.1021/acs.analchem.9b03166. Epub 2019 Dec 31.
Sparse projection pursuit analysis (SPPA), a new approach for the unsupervised exploration of high-dimensional chemical data, is proposed as an alternative to traditional exploratory methods such as principal components analysis (PCA) and hierarchical cluster analysis (HCA). Where traditional methods use variance and distance metrics for data compression and visualization, the proposed method incorporates the fourth statistical moment (kurtosis) to access interesting subspaces that can clarify relationships within complex data sets. The quasi-power algorithm used for projection pursuit is coupled with a genetic algorithm for variable selection to efficiently generate sparse projection vectors that improve the chemical interpretability of the results while at the same time mitigating the problem of overmodeling. Several multivariate chemical data sets are employed to demonstrate that SPPA can reveal meaningful clusters in the data where other unsupervised methods cannot.
稀疏投影寻踪分析(SPPA)是一种用于探索高维化学数据的无监督新方法,可作为主成分分析(PCA)和层次聚类分析(HCA)等传统探索方法的替代方法。传统方法使用方差和距离度量进行数据压缩和可视化,而所提出的方法则结合了第四统计矩(峰度)来访问有趣的子空间,从而可以阐明复杂数据集内的关系。用于投影寻踪的拟幂算法与遗传算法相结合,用于变量选择,以有效地生成稀疏投影向量,从而提高结果的化学可解释性,同时减轻过拟合问题。几个多元化学数据集用于证明 SPPA 可以揭示其他无监督方法无法揭示的有意义的数据集聚类。