Suppr超能文献

利用深度残差学习和选择性状态空间模型改进细粒度食物分类

Improving fine-grained food classification using deep residual learning and selective state space models.

作者信息

Chen Chi-Sheng, Chen Guan-Ying, Zhou Dong, Jiang Di, Chen Daishi, Chang Shao-Hsuan

机构信息

Neuro Industry, Inc., San Francisco, California, United States of America.

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.

出版信息

PLoS One. 2025 May 5;20(5):e0322695. doi: 10.1371/journal.pone.0322695. eCollection 2025.

Abstract

BACKGROUND

Food classification is the foundation for developing food vision tasks and plays a key role in the burgeoning field of computational nutrition. Due to the complexity of food requiring fine-grained classification, the Convolutional Neural Networks (CNNs) backbone needs additional structural design, whereas Vision Transformers (ViTs), containing the self-attention module, has increased computational complexity.

METHODS

We propose a ResVMamba model and validate its performance on processing complex food dataset. Unlike previous fine-grained classification models that heavily rely on attention mechanisms or hierarchical feature extraction, our method leverages a novel residual learning strategy within a state-space framework to improve representation learning. This approach enables the model to efficiently capture both global and local dependencies, surpassing the computational efficiency of Vision Transformers (ViTs) while maintaining high accuracy. We introduce an academically underestimated food dataset CNFOOD-241, and compare the CNFOOD-241 with other food databases.

RESULTS

The proposed ResVMamba surpasses current state-of-the-art (SOTA) models, achieving a Top-1 classification accuracy of 81.70% and a Top-5 accuracy of 96.83%. Our findings elucidate that our proposed methodology establishes a new benchmark for SOTA performance in food recognition on the CNFOOD-241 dataset.

CONCLUSIONS

We pioneer the integration of a residual learning framework within the VMamba model to concurrently harness both global and local state features. The code can be obtained on GitHub: https://github.com/ChiShengChen/ResVMamba.

摘要

背景

食品分类是开展食品视觉任务的基础,在蓬勃发展的计算营养领域发挥着关键作用。由于食品分类需要细粒度分类,卷积神经网络(CNN)主干需要额外的结构设计,而包含自注意力模块的视觉Transformer(ViT)计算复杂度增加。

方法

我们提出了一种ResVMamba模型,并在处理复杂食品数据集时验证了其性能。与以往严重依赖注意力机制或分层特征提取的细粒度分类模型不同,我们的方法在状态空间框架内利用了一种新颖的残差学习策略来改进表征学习。这种方法使模型能够有效地捕捉全局和局部依赖性,在保持高精度的同时超越了视觉Transformer(ViT)的计算效率。我们引入了一个在学术上被低估的食品数据集CNFOOD - 241,并将其与其他食品数据库进行比较。

结果

所提出的ResVMamba超越了当前的最先进(SOTA)模型,实现了81.70%的Top - 1分类准确率和96.83%的Top - 5准确率。我们的研究结果表明,我们提出的方法在CNFOOD - 241数据集上的食品识别SOTA性能方面建立了一个新的基准。

结论

我们率先在VMamba模型中集成了残差学习框架,以同时利用全局和局部状态特征。代码可在GitHub上获取:https://github.com/ChiShengChen/ResVMamba。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0016/12052122/6d9843980f25/pone.0322695.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验