College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China.
Anal Methods. 2023 Feb 9;15(6):719-728. doi: 10.1039/d2ay01805e.
The prediction accuracy of calibration models for near-infrared (NIR) spectroscopy typically relies on the morphology and homogeneity of the samples. To achieve non-homogeneous tobacco samples for non-destructive and rapid analysis, a method that can predict tobacco filament samples using reliable models based on the corresponding tobacco powder is proposed here. First, as it is necessary to establish a simple and robust calibrated model with excellent performance, based on full-wavelength PLSR (Full-PLSR), the key feature variables were screened by three methods, namely competitive adaptive reweighted sampling (CARS), variable combination population analysis-iteratively retaining informative variables (VCPA-IRIV), and variable combination population analysis-genetic algorithm (VCPA-GA). The partial least squares regression (PLSR) models for predicting the total sugar content in tobacco were established based on three optimal wavelength sets and named CARS-PLSR, VCPA-IRIV-PLSR and VCPA-GA-PLSR, respectively. Subsequently, they were combined with different calibration transfer algorithms, including calibration transfer based on canonical correlation analysis (CTCCA), slope/bias correction (S/B) and non-supervised parameter-free framework for calibration enhancement (NS-PFCE), to evaluate the best prediction model for the tobacco filament samples. Compared with the previous two transfer algorithms, NS-PFCE performed the best under various wavelength conditions. The prediction results indicated that the most successful approach for predicting the tobacco filament samples was achieved by VCPA-IRIV-PLSR when coupled with the NS-PFCE method, which obtained the highest determination coefficient ( = 0.9340) and the lowest root mean square error of the prediction set (RMSEP = 0.8425). VCPA-IRIV simplifies the calibration model and improves the efficiency of model transfer (31 variables). Furthermore, it pledges the prediction accuracy of the tobacco filament samples when combined with NS-PFCE. In summary, calibration transfer based on optimized feature variables can eliminate prediction errors caused by sample morphological differences and proves to be a more beneficial method for online application in the tobacco industry.
近红外(NIR)光谱的校准模型的预测精度通常依赖于样品的形态和均匀性。为了实现非均匀烟草样品的无损和快速分析,提出了一种可以使用基于相应烟草粉末的可靠模型来预测烟草灯丝样品的方法。首先,由于需要建立一种具有出色性能的简单而强大的校准模型,因此基于全波长偏最小二乘回归(Full-PLSR),使用三种方法筛选关键特征变量,即竞争自适应重加权采样(CARS)、变量组合种群分析-迭代保留信息变量(VCPA-IRIV)和变量组合种群分析-遗传算法(VCPA-GA)。基于三个最佳波长集建立了预测烟草总糖含量的偏最小二乘回归(PLSR)模型,分别命名为 CARS-PLSR、VCPA-IRIV-PLSR 和 VCPA-GA-PLSR。随后,将它们与不同的校准传递算法相结合,包括基于典范相关分析的校准传递(CTCCA)、斜率/偏差校正(S/B)和无监督参数自由校准增强框架(NS-PFCE),以评估用于烟草灯丝样品的最佳预测模型。与前两种传递算法相比,在各种波长条件下,NS-PFCE 的表现最佳。预测结果表明,当与 NS-PFCE 方法结合时,VCPA-IRIV-PLSR 是预测烟草灯丝样品最成功的方法,其获得了最高的决定系数(=0.9340)和预测集的最低均方根误差(RMSEP=0.8425)。VCPA-IRIV 简化了校准模型,提高了模型传递的效率(31 个变量)。此外,当与 NS-PFCE 结合时,它保证了烟草灯丝样品的预测精度。总之,基于优化特征变量的校准传递可以消除由于样品形态差异引起的预测误差,并且被证明是在烟草行业中进行在线应用的更有益的方法。