Zhao Rongyue, Li Wangsen, Xu Jinchai, Chen Linjie, Wei Xuan, Kong Xiangzeng
School of Future Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou 350100, China.
Anal Methods. 2025 Jan 30;17(5):1090-1100. doi: 10.1039/d4ay01970a.
Near-infrared (NIR) spectroscopy, with its advantages of non-destructive analysis, simple operation, and fast detection speed, has been widely applied in various fields. However, the effectiveness of current spectral analysis techniques still relies on complex preprocessing and feature selection of spectral data. While data-driven deep learning can automatically extract features from raw spectral data, it typically requires large amounts of labeled data for training, limiting its application in spectral analysis. To address this issue, we propose a self-supervised learning (SSL) framework based on convolutional neural networks (CNN) to enhance spectral analysis performance with small sample sizes. The method comprises two learning stages: pre-training and fine-tuning. In the pre-training stage, a large amount of pseudo-labeled data is used to learn intrinsic spectral features, followed by fine-tuning with a smaller set of labeled data to complete the final model training. Applied to our own collected dataset of three tea varieties, the proposed model achieved a classification accuracy of 99.12%. Additionally, experiments on three public datasets demonstrated that the SSL model significantly outperforms traditional machine learning methods, achieving accuracies of 97.83%, 98.14%, and 99.89%, respectively. Comparative experiments further confirmed the effectiveness of the pre-training stage, with the highest accuracy improvement, reaching 10.41%. These results highlight the potential of the proposed method for handling small sample spectral data, providing a viable solution for improved spectral analysis.
近红外(NIR)光谱技术因其具有无损分析、操作简单、检测速度快等优点,已在各个领域得到广泛应用。然而,当前光谱分析技术的有效性仍依赖于光谱数据的复杂预处理和特征选择。虽然数据驱动的深度学习可以从原始光谱数据中自动提取特征,但它通常需要大量的标记数据进行训练,这限制了其在光谱分析中的应用。为了解决这个问题,我们提出了一种基于卷积神经网络(CNN)的自监督学习(SSL)框架,以提高小样本量下的光谱分析性能。该方法包括两个学习阶段:预训练和微调。在预训练阶段,使用大量伪标记数据来学习光谱固有特征,然后用较少的标记数据集进行微调,以完成最终的模型训练。将该模型应用于我们自己收集的三种茶叶品种的数据集,分类准确率达到了99.12%。此外,在三个公共数据集上的实验表明,SSL模型显著优于传统机器学习方法,准确率分别达到了97.83%、98.14%和99.89%。对比实验进一步证实了预训练阶段的有效性,准确率提高幅度最高可达10.41%。这些结果突出了所提方法处理小样本光谱数据的潜力,为改进光谱分析提供了一个可行的解决方案。