Department of Chemistry, Dalhousie University, Halifax, NS, Canada.
Anal Chim Acta. 2011 Oct 17;704(1-2):1-15. doi: 10.1016/j.aca.2011.08.006. Epub 2011 Aug 11.
As a powerful method for exploratory data analysis, projection pursuit (PP) often outperforms principal component analysis (PCA) to discover important data structure. PP was proposed in 1970s but has not been widely used in chemistry largely because of the difficulty in the optimization of projection indices. In this work, new algorithms, referred as "quasi-power methods", are proposed to optimize kurtosis as a projection index. The new algorithms are simple, fast, and stable, which makes the search for the global optimum more efficient in the presence of multiple local optima. Maximization of kurtosis is helpful in the detection of outliers, while minimization tends to reveal clusters in the data, so the ability to search separately for the maximum and minimum of kurtosis is desirable. The proposed algorithms can search for either with only minor changes. Unlike other methods, no optimization of step size is required and sphering or whitening of the data is not necessary. Both univariate and multivariate kurtosis can be optimized by the proposed algorithms. The performance of the algorithms is evaluated using three simulated data sets and its utility is demonstrated with three experimental data sets relevant to analytical chemistry.
作为一种强大的探索性数据分析方法,投影寻踪(PP)通常比主成分分析(PCA)更能发现重要的数据结构。PP 于 20 世纪 70 年代提出,但在化学中并未得到广泛应用,主要是因为投影指标的优化存在困难。在这项工作中,我们提出了新的算法,称为“拟幂法”,用于优化峰度作为投影指标。新算法简单、快速、稳定,使得在存在多个局部最优解的情况下,全局最优解的搜索更加高效。峰度最大化有助于检测异常值,而最小化则倾向于揭示数据中的聚类,因此,能够分别搜索峰度的最大值和最小值是可取的。所提出的算法只需稍作更改即可进行搜索。与其他方法不同,不需要优化步长,也不需要对数据进行球形化或白化。所提出的算法可以优化单变量和多变量峰度。使用三个模拟数据集评估了算法的性能,并使用三个与分析化学相关的实验数据集证明了其效用。