Yang Hui-hua, Qin Feng, Wang Yong, Wu Yun-ming, Shi Xiao-hao, Liang Qiong-lin, Wang Yi-ming, Luo Guo-an
Analysis Center, Tsinghua University, Beijing 100084, China.
Guang Pu Xue Yu Guang Pu Fen Xi. 2007 Oct;27(10):1955-8.
The traditional near infrared (NIR) spectra modeling algorithm-partial least squares (PLS) can't effectively reflect the nonlinear correlations existing between the near infrared spectra and the chemical or physical properties of samples. Locally linear embedding (LLE) is a newly proposed nonlinear dimension reduction algorithm, which is a kind of manifold learning algorithm. It can find out the intrinsic dimension from high dimensional data effectively, and map the high dimensional input data points to a global low dimensional coordinates while keeping the spatial relations of the adjacent points, i. e. the geometry structure of the high dimensional space. No application of LLE in the information processing of NIR spectra has been reported. By combining LLE and PLS, a novel nonlinear modeling method LLE-PLS for NIR spectra was proposed. In the proposed method, LLE and PLS were adopted to deduct the dimensions of NIR spectra and build regressor, respectively. The LLE-PLS method was applied to correlate the NIR spectra with the concentrations of salvia acid B in the elution of column chromatography of Salvianolate. The results showed that LLE-PLS outperformed other preprocessing methods such as multiplicative scattering correction, the 1st derivative, vector normalization, minimum-maximum normalization, detrend, debias, and the 2nd derivative. After parameter optimization, LLE-PLS can accurately predict the concentration of salvia acid B, with a minimum RMSECV of 0.128 mg x mL(-1) and r2 of 0.9988, suggesting that LLE-PLS is better than PLS in modeling and prediction. The parameter of the number of nearest neighbor k of LLE-PLS and output dimension d can affect the performance of the method. The research showed that k is robust to RMSECV, and an excessively low or high output dimension d will result in a greater error because of insufficient or excessive information extraction. It can be concluded that LLE-PLS can effectively model the nonlinear correlations between spectra and physicochemical properties of the samples. And it is feasible to actualize online monitoring of the process of column chromatography of Salvianolate by coupling NIR spectra with LLE-PLS modeling method.
传统的近红外(NIR)光谱建模算法——偏最小二乘法(PLS)无法有效反映近红外光谱与样品化学或物理性质之间存在的非线性相关性。局部线性嵌入(LLE)是一种新提出的非线性降维算法,属于流形学习算法。它能从高维数据中有效找出内在维度,并将高维输入数据点映射到全局低维坐标,同时保持相邻点的空间关系,即高维空间的几何结构。目前尚未见LLE在近红外光谱信息处理中的应用报道。通过将LLE与PLS相结合,提出了一种用于近红外光谱的新型非线性建模方法LLE-PLS。在所提方法中,分别采用LLE和PLS进行近红外光谱降维和构建回归模型。将LLE-PLS方法应用于丹酚酸柱色谱洗脱液中丹酚酸B浓度与近红外光谱的关联分析。结果表明,LLE-PLS优于其他预处理方法,如多元散射校正、一阶导数、向量归一化、最小-最大归一化、去趋势、去偏置和二阶导数。经过参数优化后,LLE-PLS能准确预测丹酚酸B的浓度,最小交叉验证均方根误差(RMSECV)为0.128 mg·mL-1,决定系数(r2)为0.9988,表明LLE-PLS在建模和预测方面优于PLS。LLE-PLS的最近邻数k和输出维度d参数会影响该方法的性能。研究表明,k对RMSECV具有鲁棒性,输出维度d过低或过高都会因信息提取不足或过多而导致较大误差。可以得出结论,LLE-PLS能有效对样品光谱与理化性质之间的非线性相关性进行建模。将近红外光谱与LLE-PLS建模方法相结合实现丹酚酸柱色谱过程的在线监测是可行的。