Department of Physics, State Key Laboratory of Optoelectronic Materials and Technologies, Sun Yat-sen University, Guangzhou 510275, China.
Guangdong Institute for Drug Control, Guangzhou 510663, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2022 Apr 15;271:120936. doi: 10.1016/j.saa.2022.120936. Epub 2022 Jan 29.
The feasibility of identifying geographical origin and storage age of tangerine peel was explored by using a handheld near-infrared (NIR) spectrometer combined with machine learning. A handheld NIR spectrometer (900-1700 nm) was used to scan the outer surface of tangerine peel and collect the corresponding NIR diffuse reflectance spectra. Principal component analysis (PCA) combined with Mahalanobis distance were used to detect outliers. The accuracies of all models in the anomaly set were much lower than that in calibration set and test set, indicating that the outliers were effectively identified. After removing the outliers, in order to initially explore the clustering characteristics of tangerine peels, PCA was performed on tangerine peels from different origins and the same origin with different storage ages. The results showed that the tangerine peels from the same origin or the same storage age had the potential to cluster, indicating that the spectral data of the same origin or the same storage age had a certain similarity, which laid the foundation for subsequent modeling and identification. However, there were quite a few samples with different origins or different storage ages overlapped and could not be distinguished from each other. In order to achieve qualitative identification of origin and storage age, Savitzky-Golay convolution smoothing with first derivative (SGFD) and standard normal variate (SNV) were used to preprocess the raw spectra. Random forest (RF), K-nearest neighbor (KNN) and linear discriminant analysis (LDA) were used to establish the discriminant model. The results showed that SGFD-LDA could accurately distinguish the origin and storage age of tangerine peel at the same time. The origin identification accuracy was 96.99%. The storage age identification accuracy was 100% for Guangdong tangerine peel and 97.15% for Sichuan tangerine peel. This indicated that the near-infrared spectroscopy (NIRS) combine with machine learning can simultaneously and rapidly identify the origin and storage age of tangerine peel on site.
利用手持式近红外(NIR)光谱仪结合机器学习,探索了识别蜜橘皮产地和贮藏年龄的可行性。使用手持式 NIR 光谱仪(900-1700nm)扫描蜜橘皮的外表面并采集相应的 NIR 漫反射光谱。主成分分析(PCA)结合马氏距离用于检测异常值。异常值集中所有模型的准确率均明显低于校正集和测试集,表明异常值得到了有效识别。去除异常值后,为了初步探索蜜橘皮的聚类特征,对来自不同产地和同一产地不同贮藏年龄的蜜橘皮进行 PCA 分析。结果表明,来自同一产地或同一贮藏年龄的蜜橘皮具有聚类的潜力,表明同一产地或同一贮藏年龄的光谱数据具有一定的相似性,为后续建模和识别奠定了基础。然而,仍有相当多的来自不同产地或不同贮藏年龄的样本重叠,无法相互区分。为了实现对产地和贮藏年龄的定性识别,对原始光谱进行了 Savitzky-Golay 卷积平滑一阶导数(SGFD)和标准正态变量(SNV)预处理。随机森林(RF)、K-最近邻(KNN)和线性判别分析(LDA)用于建立判别模型。结果表明,SGFD-LDA 可以同时准确区分蜜橘皮的产地和贮藏年龄。产地识别准确率为 96.99%。广东蜜橘皮的贮藏年龄识别准确率为 100%,四川蜜橘皮的贮藏年龄识别准确率为 97.15%。这表明近红外光谱(NIRS)结合机器学习可以实现现场快速同步识别蜜橘皮的产地和贮藏年龄。