现实模拟数据中的非线性以及不相关或相关误差对增强经典最小二乘法和偏最小二乘法预测能力的影响。

Melgaard David K, Haaland David M

Sandia National Laboratories, Albuquerque, New Mexico 87185-0889, USA.

Appl Spectrosc. 2004 Sep;58(9):1065-73. doi: 10.1366/0003702041959334.

Comparisons of prediction models from the new augmented classical least squares (ACLS) and partial least squares (PLS) multivariate spectral analysis methods were conducted using simulated data containing deviations from the idealized model. The simulated data were based on pure spectral components derived from real near-infrared spectra of multicomponent dilute aqueous solutions. Simulated uncorrelated concentration errors, uncorrelated and correlated spectral noise, and nonlinear spectral responses were included to evaluate the methods on situations representative of experimental data. The statistical significance of differences in prediction ability was evaluated using the Wilcoxon signed rank test. The prediction differences were found to be dependent on the type of noise added, the numbers of calibration samples, and the component being predicted. For analyses applied to simulated spectra with noise-free nonlinear response, PLS was shown to be statistically superior to ACLS for most of the cases. With added uncorrelated spectral noise, both methods performed comparably. Using 50 calibration samples with simulated correlated spectral noise, PLS showed an advantage in 3 out of 9 cases, but the advantage dropped to 1 out of 9 cases with 25 calibration samples. For cases with different noise distributions between calibration and validation, ACLS predictions were statistically better than PLS for two of the four components. Also, when experimentally derived correlated spectral error was added, ACLS gave better predictions that were statistically significant in 15 out of 24 cases simulated. On data sets with nonuniform noise, neither method was statistically better, although ACLS usually had smaller standard errors of prediction (SEPs). The varying results emphasize the need to use realistic simulations when making comparisons between various multivariate calibration methods. Even when the differences between the standard error of predictions were statistically significant, in most cases the differences in SEP were small. This study demonstrated that unlike CLS, ACLS is competitive with PLS in modeling nonlinearities in spectra without knowledge of all the component concentrations. This competitiveness is important when maintaining and transferring models for system drift, spectrometer differences, and unmodeled components, since ACLS models can be rapidly updated during prediction when used in conjunction with the prediction augmented classical least squares (PACLS) method, while PLS requires full recalibration.

使用包含与理想化模型存在偏差的模拟数据，对新的增强经典最小二乘法（ACLS）和偏最小二乘法（PLS）多元光谱分析方法的预测模型进行了比较。模拟数据基于从多组分稀水溶液的真实近红外光谱中提取的纯光谱成分。模拟了不相关的浓度误差、不相关和相关的光谱噪声以及非线性光谱响应，以在代表实验数据的情况下评估这些方法。使用Wilcoxon符号秩检验评估预测能力差异的统计显著性。发现预测差异取决于添加的噪声类型、校准样本数量以及被预测的组分。对于应用于具有无噪声非线性响应的模拟光谱的分析，在大多数情况下，PLS在统计上优于ACLS。添加不相关光谱噪声后，两种方法的表现相当。使用50个具有模拟相关光谱噪声的校准样本时，PLS在9个案例中有3个显示出优势，但在校准样本为25个时，优势降至9个案例中的1个。对于校准和验证之间具有不同噪声分布的情况，对于四个组分中的两个，ACLS预测在统计上优于PLS。此外，当添加实验得出的相关光谱误差时，在24个模拟案例中的15个中，ACLS给出的预测更好且具有统计显著性。在具有非均匀噪声的数据集上，两种方法在统计上都没有更好的表现，尽管ACLS通常具有较小的预测标准误差（SEP）。这些不同的结果强调了在比较各种多元校准方法时使用实际模拟的必要性。即使预测标准误差之间的差异具有统计显著性，在大多数情况下，SEP的差异也很小。这项研究表明，与经典最小二乘法（CLS）不同，在不知道所有组分浓度的情况下，ACLS在光谱非线性建模方面与PLS具有竞争力。当维护和转移用于系统漂移、光谱仪差异和未建模组分的模型时，这种竞争力很重要，因为ACLS模型在与预测增强经典最小二乘法（PACLS）方法结合使用时，在预测过程中可以快速更新，而PLS需要完全重新校准。