Li Jiang-bo, Guo Zhi-ming, Huang Wen-qian, Zhang Bao-hua, Zhao Chun-jiang
Guang Pu Xue Yu Guang Pu Fen Xi. 2015 Feb;35(2):372-8.
In using spectroscopy to quantitatively or qualitatively analyze the quality of fruit, how to obtain a simple and effective correction model is very critical for the application and maintenance of the developed model. Strawberry as the research object, this research mainly focused on selecting the key variables and characteristic samples for quantitatively determining the soluble solids content. Competitive adaptive reweighted sampling (CARS) algorithm was firstly proposed to select the spectra variables. Then, Samples of correction set were selected by successive projections algorithm (SPA), and 98 characteristic samples were obtained. Next, based on the selected variables and characteristic samples, the second variable selection was performed by using SPA method. 25 key variables were obtained. In order to verify the performance of the proposed CARS algorithm, variable selection algorithms including Monte Carlo-uninformative variable elimination (MC-UVE) and SPA were used as the comparison algorithms. Results showed that CARS algorithm could eliminate uninformative variables and remove the collinearity information at the same time. Similarly, in order to assess the performance of the proposed SPA algorithm for selecting the characteristic samples, SPA algorithm was compared with classical Kennard-Stone algorithm Results showed that SPA algorithm could be used for selection of the characteristic samples in the calibration set. Finally, PLS and MLR model for quantitatively predicting the SSC (soluble solids content) in the strawberry were proposed based on the variables/samples subset (25/98), respectively. Results show that models built by using the 0.59% and 65.33% information of original variables and samples could obtain better performance than using the ones obtained by using all information of the original variables and samples. MLR model was the best with R(pre)2 = 0.9097, RMSEP=0.3484 and RPD = 3.3278.
在利用光谱技术对水果品质进行定量或定性分析时,如何获得简单有效的校正模型对于所开发模型的应用和维护至关重要。以草莓为研究对象,本研究主要聚焦于选择用于定量测定可溶性固形物含量的关键变量和特征样本。首先提出采用竞争性自适应重加权采样(CARS)算法来选择光谱变量。然后,通过连续投影算法(SPA)选择校正集样本,共获得98个特征样本。接下来,基于所选变量和特征样本,再次使用SPA方法进行变量选择,得到25个关键变量。为验证所提出的CARS算法的性能,将包括蒙特卡洛无信息变量消除(MC-UVE)和SPA在内的变量选择算法作为比较算法。结果表明,CARS算法能够同时消除无信息变量并去除共线性信息。同样,为评估所提出的SPA算法用于选择特征样本的性能,将SPA算法与经典的肯纳德-斯通算法进行比较。结果表明,SPA算法可用于校准集中特征样本的选择。最后,分别基于变量/样本子集(25/98)提出了用于定量预测草莓中可溶性固形物含量(SSC)的偏最小二乘法(PLS)和多元线性回归(MLR)模型。结果表明,利用原始变量和样本的0.59%和65.33%的信息构建的模型比使用原始变量和样本的所有信息构建的模型具有更好的性能。MLR模型最佳,其预测决定系数R(pre)2 = 0.9097,预测均方根误差RMSEP = 0.3484,相对分析误差RPD = 3.3278。