J Renwick Beattie Consulting, Ballycastle, UK.
Esmonde-White Technologies, Ann Arbor, MI, USA.
Appl Spectrosc. 2021 Apr;75(4):361-375. doi: 10.1177/0003702820987847. Epub 2021 Jan 22.
Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal component analysis is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning principal component analysis is not well understood by many applied analytical scientists and spectroscopists who use principal component analysis. The meaning of features identified through principal component analysis is often unclear. This manuscript traces the journey of the spectra themselves through the operations behind principal component analysis, with each step illustrated by simulated spectra. Principal component analysis relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of principal component analysis , such as the scores representing "concentration" or "weights". The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a principal component analysis model shows how to interpret application specific chemical meaning of the principal component analysis loadings and how to analyze scores. A critical benefit of principal component analysis is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.
光谱学快速捕获大量无法直接解释的数据。主成分分析(PCA)被广泛用于通过识别数据中的重复模式,以最小的信息损失将复杂的光谱数据集简化为可理解的信息。许多使用主成分分析的应用分析科学家和光谱学家并不理解主成分分析背后的线性代数。通过主成分分析识别的特征的含义往往不清楚。本文通过模拟光谱,追踪光谱本身在主成分分析背后的操作过程,每一步都进行了说明。主成分分析仅依赖于光谱中的信息,因此数学模型取决于数据本身的性质。模型和光谱之间的直接联系允许对主成分分析进行具体的光谱解释,例如代表“浓度”或“权重”的得分。主成分(载荷)根据定义是隐藏的、重复的和不相关的光谱形状,它们线性组合生成观察到的光谱。它们可以可视化为数据集内极端差异之间的减法光谱。每个主成分都被显示为对估计光谱的连续改进,从而提高 PC 重建数据与原始数据之间的拟合度。理解主成分分析模型的数据驱动发展表明如何解释主成分分析载荷的特定于应用的化学意义,以及如何分析得分。主成分分析的一个关键优势是其简单性及其对数据集的简洁描述,这使其强大且灵活。