Li Chunting, Chen Huazhou, Zhang Youyou, Hong Shaoyong, Ai Wu, Mo Lina
College of Science, Guilin University of Technology, Guilin 541004, China.
College of Science, Guilin University of Technology, Guilin 541004, China; Center for Data Analysis and Algorithm Technology, Guilin University of Technology, Guilin 541004, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2022 Aug 5;276:121247. doi: 10.1016/j.saa.2022.121247. Epub 2022 Apr 8.
Feature selection and sample partitioning are both important to establish a quantitative analytical model for near-infrared (NIR) spectroscopy. The classical interval partial least squares (iPLS) model for waveband selection can be improved in combination of the simulated annealing (SA) algorithm. The sample set partitioning based on a joint x-y distance (SPXY) method for sample partitioning is based on the distances of both the x- and y- dimensions; it is expected to be optimized using the non-dominant sorting strategies (NS) combined with the immune algorithm (IA). In this study, we investigated the dual model optimization mode for simultaneous selection of feature waveband and sample partitioning, and proposed a novel method defined as SA-iPLS & SPXY-NSIA. The method explores a population evolution process, and takes the candidate individual as the link for the fusion optimization of SA-iPLS and SPXY-NSIA. The method screens feature wavebands and observes a good partition of the modeling samples, to construct a combined optimization strategy for fusion optimization of the target waveband and suitable sets of sample partitioning. The performance of the SA-iPLS & SPXY-NSIA method was tested using a soil sample dataset. To prove model enhancement, the proposed method was compared to the two traditional methods of Kennard-Stone (KS) and SPXY in combination with SA-iPLS. Experimental results show that the fusion model established by SA-iPLS & SPXY-NSIA performed better than the KS-SA-iPLS and SPXY-SA-iPLS models. The best testing results of the fusion model is with RMSET, RPD and R observed as 0.0107, 1.7233 and 0.9097, respectively. The proposed method is prospectively able to effectively improve the predictive ability of the NIR analytical model.
特征选择和样本划分对于建立近红外(NIR)光谱定量分析模型都很重要。用于波段选择的经典间隔偏最小二乘法(iPLS)模型可以结合模拟退火(SA)算法进行改进。基于联合x-y距离(SPXY)方法的样本集划分用于样本划分,它基于x维和y维的距离;期望使用结合免疫算法(IA)的非支配排序策略(NS)对其进行优化。在本研究中,我们研究了同时选择特征波段和样本划分的双重模型优化模式,并提出了一种定义为SA-iPLS & SPXY-NSIA的新方法。该方法探索种群进化过程,并将候选个体作为SA-iPLS和SPXY-NSIA融合优化的纽带。该方法筛选特征波段并观察建模样本的良好划分,以构建目标波段融合优化和合适样本划分集的组合优化策略。使用土壤样本数据集测试了SA-iPLS & SPXY-NSIA方法的性能。为了证明模型的增强效果,将所提出的方法与两种传统方法Kennard-Stone(KS)和结合SA-iPLS的SPXY进行了比较。实验结果表明,由SA-iPLS & SPXY-NSIA建立的融合模型比KS-SA-iPLS和SPXY-SA-iPLS模型表现更好。融合模型的最佳测试结果是均方根误差(RMSET)、预测偏差比(RPD)和决定系数(R)分别为0.0107、1.7233和0.9097。所提出的方法有望有效提高近红外分析模型的预测能力。