Spiegelman C H, McShane M J, Goetz M J, Motamedi M, Yue Q L, Coté G L
Department of Statistics and Biomedical Engineering Program, Texas A&M University, College Station, Texas 77845, and Biomedical Engineering Center, Laser & Spectroscopy Program, University of Texas Medical Branch, Galveston, Texas 77550.
Anal Chem. 1998 Jan 1;70(1):35-44. doi: 10.1021/ac9705733.
The mathematical basis of improved calibration through selection of informative variables for partial least-squares calibration has been identified. A theoretical investigation of calibration slopes indicates that including uninformative wavelengths negatively affect calibrations by producing both large relative bias toward zero and small additive bias away from the origin. These theoretical results are found regardless of the noise distribution in the data. Studies are performed to confirm this result using a previously used selection method compared to a new method, which is designed to perform more appropriately when dealing with data having large outlying points by including estimates of spectral residuals. Three different data sets are tested with varying noise distributions. In the first data set, Gaussian and log-normal noise was added to simulated data which included a single peak. Second, near-infrared spectra of glucose in cell culture media taken with an FT-IR spectrometer were analyzed. Finally, dispersive Raman Stokes spectra of glucose dissolved in water were assessed. In every case considered here, improved prediction is produced through selection, but data with different noise characteristics showed varying degrees of improvement depending on the selection method used. The practical results showed that, indeed, including residuals into ranking criteria improves selection for data with noise distributions resulting in large outliers. It was concluded that careful design of a selection algorithm should include consideration of spectral noise distributions in the input data to increase the likelihood of successful and appropriate selection.
已确定通过为偏最小二乘校准选择信息变量来改进校准的数学基础。校准斜率的理论研究表明,包含无信息波长会对校准产生负面影响,因为它会产生向零的大相对偏差和远离原点的小加性偏差。无论数据中的噪声分布如何,都能得到这些理论结果。进行了多项研究,将一种先前使用的选择方法与一种新方法进行比较,以证实这一结果。新方法旨在通过纳入光谱残差估计,在处理具有大量异常点的数据时表现得更合适。对三个具有不同噪声分布的数据集进行了测试。在第一个数据集中,将高斯噪声和对数正态噪声添加到包含单个峰的模拟数据中。其次,分析了用傅里叶变换红外光谱仪采集的细胞培养基中葡萄糖的近红外光谱。最后,评估了溶解在水中的葡萄糖的色散拉曼斯托克斯光谱。在这里考虑的每种情况下,通过选择都能产生改进的预测,但具有不同噪声特征的数据根据所使用的选择方法显示出不同程度的改进。实际结果表明,确实,将残差纳入排序标准可改善对具有导致大量异常值的噪声分布的数据的选择。得出的结论是,选择算法的精心设计应包括考虑输入数据中的光谱噪声分布,以增加成功且合适选择的可能性。