Li Peng-Fei, Wang Jia-Hua, Cao Nan-Ning, Han Dong-Hai
College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China.
Guang Pu Xue Yu Guang Pu Fen Xi. 2009 Oct;29(10):2637-41.
The feasibility of using efficient selection of variables in Vis/NIR for a rapid and conclusive determination of fruit inner qualities such as soluble solids content (SSC) of plums was investigated. A new strategy was proposed in the present paper, i. e. two-stage variable selection using the backward interval partial least squares (BiPLS) combined with genetic algorithm (GA). Firstly, it splits the whole spectral region into equidistant sub-regions and then develops all BiPLS regression models, and the informative regions which are used to constructed PLS models with the lowest error can be located. Secondly, GA method is used to select variable in these informative regions, which are used for regression variables of MLR model. The Vis/NIR spectra containing 225 individual data points were processed by Savizky-Golay filter smoothing and second-order derivative, and 9 sub-regions were selected by BiPLS procedure when the spectra were divided into 25 sub-regions. The optimal 12 variables, which were the output of the GA procedure, were selected by the higher occurrence frequency while the GA procedure ran 100 times. In order to simplify the multiple linear regression (MLR) modeling, the wavelength variables with the maximum occurrence frequency were chosen when the adjacent wavelengths were selected by GA. Finally, 638, 734, 752, 868, 910, 916 and 938 nm were used to build a MLR model. The results show that MLR model produced by BiPLS-GA performs well with correlation coefficients (R) of 0.984, root mean standard error of calibration (RMSEC) of 0.364 and root mean standard error of prediction (RMSEP) of 0.471 for SSC, which outperforms models using stepwise regression analysis (SRA). This work proved that the BiPLS-GA could determine optimal variables in Vis/NIR spectra and improve the accuracy of model.
研究了在可见/近红外光谱中利用有效变量选择快速准确测定李子果实内部品质(如可溶性固形物含量,SSC)的可行性。本文提出了一种新策略,即采用向后间隔偏最小二乘法(BiPLS)结合遗传算法(GA)的两阶段变量选择方法。首先,将整个光谱区域划分为等距子区域,然后建立所有BiPLS回归模型,从而确定用于构建误差最小的PLS模型的信息区域。其次,利用GA方法在这些信息区域中选择变量,作为多元线性回归(MLR)模型的回归变量。对包含225个独立数据点的可见/近红外光谱进行Savizky-Golay滤波平滑和二阶导数处理,当光谱划分为25个子区域时,通过BiPLS程序选择出9个子区域。GA程序运行100次时,通过较高的出现频率选择出作为GA程序输出的最优12个变量。为简化多元线性回归(MLR)建模,当GA选择相邻波长时,选择出现频率最高的波长变量。最后,利用638、734、752、868、910、916和938nm建立MLR模型。结果表明,BiPLS-GA产生的MLR模型性能良好,SSC的相关系数(R)为0.984,校正均方根误差(RMSEC)为0.364,预测均方根误差(RMSEP)为0.471,优于逐步回归分析(SRA)模型。这项工作证明,BiPLS-GA可以在可见/近红外光谱中确定最优变量,并提高模型的准确性。