基于生成对抗网络和近红外光谱法对不同地区的孜然和茴香进行鉴定。

Identification of cumin and fennel from different regions based on generative adversarial networks and near infrared spectroscopy.

机构信息

College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.

出版信息

Spectrochim Acta A Mol Biomol Spectrosc. 2021 Nov 5;260:119956. doi: 10.1016/j.saa.2021.119956. Epub 2021 May 13.

DOI:10.1016/j.saa.2021.119956

Abstract

Cumin (Cuminum cyminum) and fennel (Foeniculum vulgare) are widely used seasonings and play a very important role in industries such as breeding, cosmetics, winemaking, drug discovery, and nano-synthetic materials. However, studies have shown that cumin and fennel from different regions not only differ greatly in the content of lipids, phenols and proteins but also the substances contained in their essential oils are also different. Therefore, realizing precise identification of cumin and fennel from different regions will greatly help in quality control, market fraud and production industrialization. In this experiment, cumin and fennel samples were collected from each region, a total of 480 NIR spectra were collected. We used deep learning and traditional machine learning algorithms combined with near infrared (NIR) spectroscopy to identify their origin. To obtain the model with the best generalization performance and classification accuracy, we used principal component analysis (PCA) to reduce spectral data dimensionality after Rubberband baseline correction, and then established classification models including quadratic discriminant analysis based on PCA (PCA-QDA) and multilayer perceptron based on PCA (PCA-MLP). We also directly input the spectral data after baseline correction into convolutional neural networks (CNN) and generative adversarial networks (GAN). The experimental results show that GAN is more accurate than the PCA-QDA, PCA-MLP and CNN models, and the classification accuracy reached 100%. In the cumin and fennel classification experiment in the same region, the four models achieve great classification results from three regions under the condition that all model parameters remain unchanged. The experimental results show that when the training data are limited and the dimension is high, the model obtained by GAN using competitive learning has more generalization ability and higher classification accuracy. It also provides a new method for solving the problem of limited training data in food research and medical diagnosis in the future.

摘要

孜然（Cuminum cyminum）和茴香（Foeniculum vulgare）是广泛使用的调味料，在养殖、化妆品、酿酒、药物发现和纳米合成材料等行业发挥着非常重要的作用。然而，研究表明，来自不同地区的孜然和茴香不仅在脂质、酚类和蛋白质含量上差异很大，而且其精油中所含的物质也不同。因此，实现对来自不同地区的孜然和茴香的精确识别将极大地有助于质量控制、市场欺诈和生产产业化。在本实验中，从每个地区采集孜然和茴香样本，共采集了 480 个近红外（NIR）光谱。我们使用深度学习和传统机器学习算法结合近红外（NIR）光谱技术来识别它们的来源。为了获得具有最佳泛化性能和分类精度的模型，我们使用主成分分析（PCA）对橡胶带基线校正后的光谱数据进行降维，然后建立了包括基于 PCA 的二次判别分析（PCA-QDA）和基于 PCA 的多层感知器（PCA-MLP）的分类模型。我们还直接将基线校正后的光谱数据输入卷积神经网络（CNN）和生成对抗网络（GAN）。实验结果表明，GAN 比 PCA-QDA、PCA-MLP 和 CNN 模型更准确，分类精度达到 100%。在同一地区的孜然和茴香分类实验中，在所有模型参数保持不变的情况下，四个模型从三个地区实现了很好的分类结果。实验结果表明，当训练数据有限且维度较高时，GAN 模型通过竞争学习获得的模型具有更强的泛化能力和更高的分类精度。它也为解决未来食品研究和医学诊断中有限训练数据的问题提供了一种新方法。