Jiao Yujie, Zhao Yuqing, Jia Aoying, Wang Tianyun, Li Jiashun, Xiang Kaiming, Deng Hangyu, He Maochang, Jiang Rui, Zhang Yue
Faculty of Mechanical and Electrical Engineering, Yunnan Agriculture University, Kunming, China.
Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province, Kunming, China.
PLoS One. 2025 May 14;20(5):e0322198. doi: 10.1371/journal.pone.0322198. eCollection 2025.
A novel shifted window (Swin) Transformer coffee bean grading model called Swin-HSSAM has been proposed to address the challenges of accurately classifying green coffee beans and low identification accuracy. This model integrated the Swin Transformer as the backbone network; fused features from the second, third, and fourth stages using the high-level screening-feature pyramid networks module; and incorporated the selective attention module (SAM) for discriminative power enhancement to enhance the feature outputs before classification. Fusion Loss was employed as the loss function. Experimental results on a proprietary coffee bean dataset demonstrate that the Swin-HSSAM model achieved an average grading accuracy of 96.34% for the three grading as well as the nine defect subdivision levels, outperforming the AlexNet, VGG16, ResNet50, MobileNet-v2, Vision Transformer (ViT), and CrossViT models by 3.86%, 2.56%, 0.44%, 4.05%, 5.36%, and 5.40% percentage points, respectively. Evaluations on a public coffee bean dataset revealed that, compared with the aforementioned models, the Swin-HSSAM model improved the average grading accuracy by 1.01%, 0.13%, 4.75%, 0.85%, 0.73%, and 0.27% percentage points, respectively. These results indicate that the Swin-HSSAM model not only achieved high grading accuracy but also exhibited broad applicability, providing a novel solution for the automated grading and identification of green coffee beans.
为应对生咖啡豆准确分类的挑战和低识别准确率问题,提出了一种名为Swin-HSSAM的新型移位窗口(Swin)Transformer咖啡豆分级模型。该模型将Swin Transformer集成作为骨干网络;使用高级筛选特征金字塔网络模块融合第二、第三和第四阶段的特征;并纳入选择性注意力模块(SAM)以增强判别力,从而在分类前增强特征输出。采用融合损失作为损失函数。在一个专有的咖啡豆数据集上的实验结果表明,Swin-HSSAM模型在三个分级以及九个缺陷细分级别上实现了96.34%的平均分级准确率,分别比AlexNet、VGG16、ResNet50、MobileNet-v2、视觉Transformer(ViT)和CrossViT模型高出3.86%、2.56%、0.44%、4.05%、5.36%和5.40个百分点。在一个公共咖啡豆数据集上的评估显示,与上述模型相比,Swin-HSSAM模型的平均分级准确率分别提高了1.01%、0.13%、4.75%、0.85%、0.73%和0.27个百分点。这些结果表明,Swin-HSSAM模型不仅实现了高分级准确率,而且具有广泛的适用性,为生咖啡豆的自动分级和识别提供了一种新的解决方案。