State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin, 300387, P. R. China.
Anal Methods. 2021 Mar 21;13(11):1374-1380. doi: 10.1039/d1ay00017a. Epub 2021 Mar 2.
Ensemble modeling has gained increasing attention for improving the performance of quantitative models in near infrared (NIR) spectral analysis. Based on Monte Carlo (MC) resampling, least absolute shrinkage and selection operator (LASSO) and partial least squares (PLS), a new ensemble strategy named MC-LASSO-PLS is proposed for NIR spectral multivariate calibration. In this method, the training subsets for building the sub-models are generated by sampling from both samples and variables to ensure the diversity of the models. In detail, a certain number of samples as sample subsets are randomly selected from training set. Then, LASSO is used to shrink the variables of the sample subset to form the training subset, which is used to build the PLS sub-model. This process is repeated N times and N sub-models are obtained. Finally, the predictions of these sub-models are used to produce the final prediction by simple average. The prediction ability of the proposed method was compared with those of LASSO-PLS, MC-PLS and PLS models on the NIR spectra of corn, blend oil and orange juice samples. The superiority of MC-LASSO-PLS in prediction ability is demonstrated.
基于蒙特卡罗(MC)重采样、最小绝对值收缩和选择算子(LASSO)以及偏最小二乘法(PLS),提出了一种新的用于近红外(NIR)光谱多元校准的集成策略,称为 MC-LASSO-PLS。在该方法中,通过从样本和变量中同时进行采样来生成用于构建子模型的训练子集,以确保模型的多样性。具体来说,从训练集中随机选择一定数量的样本作为样本子集。然后,使用 LASSO 对样本子集的变量进行收缩,形成训练子集,用于构建 PLS 子模型。该过程重复 N 次,得到 N 个子模型。最后,通过简单平均使用这些子模型的预测结果来生成最终预测。将所提出的方法的预测能力与玉米、混合油和橙汁样品的 NIR 光谱上的 LASSO-PLS、MC-PLS 和 PLS 模型的预测能力进行了比较,证明了 MC-LASSO-PLS 在预测能力方面的优越性。