Zhu Yaolin, Chen Long, Chen Xin, Chen Jinni, Zhang Hongsong
School of Electronics and Information, Xi'an Polytechnic University, Xi'an, 710000, China.
Shanghai Ranzi Industrial Co., Ltd, Shanghai, 201800, China.
Heliyon. 2024 Jul 14;10(14):e34537. doi: 10.1016/j.heliyon.2024.e34537. eCollection 2024 Jul 30.
Cashmere and wool fibers have similar chemical compositions, making them difficult to distinguish based on their absorption peaks and band positions in near-infrared spectroscopy. Existing studies commonly use wavelength selection or feature extraction algorithms to obtain significant spectral features, but traditional algorithms often overlook the correlations between wavelengths, resulting in weak adaptability and local optimum issues. To address this problem, this paper proposes a recognition algorithm based on optimal wavelength selection, which can remove redundant information and make the model effective in capturing patterns and key features of the data. The wavelengths are rearranged by computing the information gain ratio for each wavelength. Then, the sorted wavelengths are grouped based on equal density, which ensures that all wavelengths within each group have equal information and avoids over-focusing on individual groups. Meanwhile, the group genetic algorithm is used to find the wavelengths with highly informative and search optimal grouped combinations, in order to explore the entire spectrum wavelength. Finally, combined with a partial least squares discriminant analysis(PLS-DA) model, the recognition accuracy reached 97.3 %. The results indicate that, compared to traditional methods such as CARS, SPA, and GA, our method effectively reduces redundant information, selects fewer but more informative wavelengths, and improves classification accuracy and model adaptability.
羊绒和羊毛纤维具有相似的化学成分,这使得基于它们在近红外光谱中的吸收峰和谱带位置难以区分。现有研究通常使用波长选择或特征提取算法来获取显著的光谱特征,但传统算法往往忽略了波长之间的相关性,导致适应性弱和局部最优问题。为了解决这个问题,本文提出了一种基于最优波长选择的识别算法,该算法可以去除冗余信息,使模型有效地捕捉数据的模式和关键特征。通过计算每个波长的信息增益比来重新排列波长。然后,根据等密度对排序后的波长进行分组,这确保了每组内的所有波长具有相等的信息,并避免过度关注个别组。同时,使用分组遗传算法找到具有高信息量的波长并搜索最优分组组合,以探索整个光谱波长。最后,结合偏最小二乘判别分析(PLS-DA)模型,识别准确率达到了97.3%。结果表明,与CARS、SPA和GA等传统方法相比,我们的方法有效地减少了冗余信息,选择了更少但更具信息量的波长,并提高了分类准确率和模型适应性。