Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, PO Box 15000, Halifax, NS, B3H 4R2, Canada.
Institute for Advanced Study in Basic Sciences, GavaZang, Zanjan, 45137-66731, Iran.
Anal Methods. 2021 Sep 30;13(37):4188-4219. doi: 10.1039/d1ay01124c.
Multivariate data analysis tools have become an integral part of modern analytical chemistry, and principal component analysis (PCA) is perhaps foremost among these. PCA is central in approaching many problems in data exploration, classification, calibration, modelling, and curve resolution. However, PCA is only one form of a broader group of factor analysis (FA) methods that are rarely employed by chemists. The dominance of PCA in chemistry is primarily a consequence of history and convenience, but this has obscured the potential advantages of other FA tools that are widely used in other fields. The purpose of this article, which is intended for those who are already familiar with the mathematical foundations and applications of PCA, is to develop a framework to relate PCA to other commonly used FA methods from the perspective of chemical applications. Specifically, PCA is compared to maximum likelihood factor analysis (MLFA), principal axis factorization (PAF) and maximum likelihood PCA (MLPCA). Similarities and differences are highlighted with regard to the assumptions and constraints of the models, algorithms employed, and calculation of scores and loadings. Practical aspects such as data dimensionality, preprocessing, rank estimation, improper solutions (Heywood cases), and software implementation are considered. The performance of the four methods is compared using both simulated and experimental data sets. While PCA provides the most reliable estimates when measurement error variance is uniform (homoscedastic noise) and MLPCA works best when the error covariance matrix is explicitly known, MLFA and PAF have the distinct advantage of providing information about measurement uncertainty and adapting to situations of unknown heteroscedastic errors, eliminating the need for scaling. Moreover, MLFA in particular is shown to be tolerant to deviations from model linearity. These results make a strong case for increased application of other FA methods in chemistry.
多元数据分析工具已成为现代分析化学不可或缺的一部分,而主成分分析(PCA)可能是其中最重要的一种。PCA 在数据探索、分类、校准、建模和曲线解析等许多问题中处于核心地位。然而,PCA 只是更广泛的因子分析(FA)方法组中的一种形式,化学家很少使用。PCA 在化学中的主导地位主要是历史和便利性的结果,但这掩盖了其他 FA 工具在其他领域广泛应用的潜在优势。本文旨在为那些已经熟悉 PCA 的数学基础和应用的人提供一个框架,从化学应用的角度来比较 PCA 与其他常用 FA 方法。具体而言,将 PCA 与最大似然因子分析(MLFA)、主轴因子分解(PAF)和最大似然 PCA(MLPCA)进行比较。在模型的假设和约束、所采用的算法以及得分和载荷的计算方面,突出了它们之间的异同。还考虑了数据维度、预处理、秩估计、不适当的解(Heywood 情况)和软件实现等实际方面。使用模拟和实验数据集比较了这四种方法的性能。当测量误差方差均匀(同方差噪声)时,PCA 提供最可靠的估计,而当明确知道误差协方差矩阵时,MLPCA 效果最佳,MLFA 和 PAF 具有明显的优势,可提供有关测量不确定性的信息,并适应未知异方差误差的情况,无需缩放。此外,MLFA 尤其显示出对模型线性度偏差的容忍性。这些结果强烈表明应在化学中更多地应用其他 FA 方法。