Pang Kunkun, Liu Yisen, Zhou Songbin, Liao Yixiao, Yin Zexuan, Zhao Lulu, Chen Hong
Guangdong Key Laboratory of Modern Control Technology, Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou 510070, China.
Foods. 2024 Nov 11;13(22):3598. doi: 10.3390/foods13223598.
Conventional food fraud detection using hyperspectral imaging (HSI) relies on the discriminative power of machine learning. However, these approaches often assume a balanced class distribution in an ideal laboratory environment, which is impractical in real-world scenarios with diverse label distributions. This results in suboptimal performance when less frequent classes are overshadowed by the majority class during training. Thus, the critical research challenge emerges of how to develop an effective classifier on a small-scale imbalanced dataset without significant bias from the dominant class. In this paper, we propose a novel nondestructive detection approach, which we call the Dice Loss Improved Self-Supervised Learning-Based Prototypical Network (Proto-DS), designed to address this imbalanced learning challenge. The proposed amalgamation mitigates the label bias on the most frequent class, further improving robustness. We validate our proposed method on three collected hyperspectral food image datasets with varying degrees of data imbalance: Citri Reticulatae Pericarpium (Chenpi), Chinese herbs, and coffee beans. Comparisons with state-of-the-art imbalanced learning techniques, including the Synthetic Minority Oversampling Technique (SMOTE) and class-importance reweighting, reveal our method's superiority. Notably, our experiments demonstrate that Proto-DS consistently outperforms conventional approaches, achieving the best average balanced accuracy of 88.18% across various training sample sizes, whereas the Logistic Model Tree (LMT), Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN) approaches attain only 59.42%, 60.38%, and 66.34%, respectively. Overall, self-supervised learning is key to improving imbalanced learning performance and outperforms related approaches, while both prototypical networks and the Dice loss can further enhance classification performance. Intriguingly, self-supervised learning can provide complementary information to existing imbalanced learning approaches. Combining these approaches may serve as a potential solution for building effective models with limited training data.
使用高光谱成像(HSI)进行传统食品欺诈检测依赖于机器学习的判别能力。然而,这些方法通常假设在理想的实验室环境中类别分布是平衡的,这在标签分布多样的现实场景中是不切实际的。当在训练过程中较少出现的类别被多数类别掩盖时,这会导致性能次优。因此,关键的研究挑战出现了,即如何在小规模不平衡数据集上开发一个有效的分类器,而不受主导类别的显著偏差影响。在本文中,我们提出了一种新颖的无损检测方法,我们称之为基于改进的自监督学习的骰子损失原型网络(Proto-DS),旨在应对这种不平衡学习挑战。所提出的融合方法减轻了最频繁出现类别上的标签偏差,进一步提高了鲁棒性。我们在三个收集的具有不同程度数据不平衡的高光谱食品图像数据集上验证了我们提出的方法:陈皮、中药材和咖啡豆。与包括合成少数过采样技术(SMOTE)和类别重要性重新加权在内的现有不平衡学习技术的比较,揭示了我们方法的优越性。值得注意的是,我们的实验表明,Proto-DS始终优于传统方法,在各种训练样本大小下实现了88.18%的最佳平均平衡准确率,而逻辑模型树(LMT)、多层感知器(MLP)和卷积神经网络(CNN)方法分别仅达到59.42%、60.38%和66.34%。总体而言,自监督学习是提高不平衡学习性能的关键,并且优于相关方法,而原型网络和骰子损失都可以进一步提高分类性能。有趣的是,自监督学习可以为现有的不平衡学习方法提供补充信息。结合这些方法可能是在有限训练数据下构建有效模型的潜在解决方案。