Gao Peng, Wang Na, Lu Yang, Liu Jinming, Hou Rui, Du Xinyue, Hao Yingying
College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China.
Anal Methods. 2025 Aug 21;17(33):6672-6683. doi: 10.1039/d5ay00848d.
To achieve the non-destructive identification of millet origin, near-infrared spectroscopy was used to collect the raw spectral data of the millet. Considering the issues of high-dimensional redundancy and spectral peak overlap in near-infrared spectral data, feature wavelengths were selected using the Competitive Adaptive Reweighted Sampling (CARS) algorithm, the Uninformative Variable Elimination (UVE) algorithm, and the Whale Optimization Algorithm (WOA), resulting in 26, 158, and 123 feature wavelengths, respectively. To further improve feature extraction effectiveness, strategies such as chaotic mapping were integrated into the Whale Optimization Algorithm (IWOA), reducing the selected feature wavelengths from 123 to 27 variables. Meanwhile, to improve model accuracy, the Crown Pig Optimization (CPO) algorithm was combined with the Least Squares Support Vector Machine (LSSVM) to construct the CPO-LSSVM model for millet origin identification. Experimental results showed that, after wavelength selection, both the LSSVM model and the CPO-LSSVM model exhibited better identification performance than the full-spectrum models. Among them, the model based on the IWOA feature wavelength selection combined with CPO-LSSVM exhibited the best performance, achieving an accuracy of 99.03%, with precision, recall, and 1 score all reaching 99.20%; compared with the full-spectrum LSSVM model, these metrics improved by 21.67%, 19.86%, 21.88%, and 20.87%, respectively. In addition, the effectiveness of the proposed IWOA feature wavelength selection method and CPO-LSSVM model was validated on public datasets. The research results demonstrate that the IWOA algorithm, while selecting an effective number of wavelengths, also improves the model's performance. The CPO-LSSVM model can rapidly and accurately identify the origin information of millet, achieving precise traceability of the millet's provenance while simultaneously providing a new reference for the origin identification of other agricultural products.
为实现对谷子产地的无损识别,采用近红外光谱法采集谷子的原始光谱数据。考虑到近红外光谱数据存在高维冗余和光谱峰重叠等问题,利用竞争性自适应重加权采样(CARS)算法、无信息变量消除(UVE)算法和鲸鱼优化算法(WOA)进行特征波长选择,分别得到26个、158个和123个特征波长。为进一步提高特征提取效果,将混沌映射等策略融入鲸鱼优化算法(IWOA),使所选特征波长从123个减少到27个变量。同时,为提高模型精度,将皇冠猪优化(CPO)算法与最小二乘支持向量机(LSSVM)相结合,构建用于谷子产地识别的CPO-LSSVM模型。实验结果表明,经过波长选择后,LSSVM模型和CPO-LSSVM模型的识别性能均优于全光谱模型。其中,基于IWOA特征波长选择结合CPO-LSSVM的模型表现最佳,准确率达到99.03%,精确率、召回率和F1分数均达到99.20%;与全光谱LSSVM模型相比,这些指标分别提高了21.67%、19.86%、21.88%和20.87%。此外,所提出的IWOA特征波长选择方法和CPO-LSSVM模型在公共数据集上得到了有效性验证。研究结果表明,IWOA算法在选择有效波长数量的同时,也提升了模型性能。CPO-LSSVM模型能够快速、准确地识别谷子的产地信息,实现谷子溯源的精准化,同时为其他农产品产地识别提供了新的参考。