一种基于可解释卷积神经网络和视觉Transformer的实时食品识别方法。

An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition.

作者信息

Nfor Kintoh Allen, Theodore Armand Tagne Poupi, Ismaylovna Kenesbaeva Periyzat, Joo Moon-Il, Kim Hee-Cheol

机构信息

Department of Computer Engineering, Inje University, Gimhae 50834, Republic of Korea.

Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae 50834, Republic of Korea.

出版信息

Nutrients. 2025 Jan 20;17(2):362. doi: 10.3390/nu17020362.

DOI:10.3390/nu17020362

PMID:39861492

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11768650/

Abstract

BACKGROUND

Food image recognition, a crucial step in computational gastronomy, has diverse applications across nutritional platforms. Convolutional neural networks (CNNs) are widely used for this task due to their ability to capture hierarchical features. However, they struggle with long-range dependencies and global feature extraction, which are vital in distinguishing visually similar foods or images where the context of the whole dish is crucial, thus necessitating transformer architecture.

OBJECTIVES

This research explores the capabilities of the CNNs and transformers to build a robust classification model that can handle both short- and long-range dependencies with global features to accurately classify food images and enhance food image recognition for better nutritional analysis.

METHODS

Our approach, which combines CNNs and Vision Transformers (ViTs), begins with the RestNet50 backbone model. This model is responsible for local feature extraction from the input image. The resulting feature map is then passed to the ViT encoder block, which handles further global feature extraction and classification using multi-head attention and fully connected layers with pre-trained weights.

RESULTS

Our experiments on five diverse datasets have confirmed a superior performance compared to the current state-of-the-art methods, and our combined dataset leveraging complementary features showed enhanced generalizability and robust performance in addressing global food diversity. We used explainable techniques like grad-CAM and LIME to understand how the models made their decisions, thereby enhancing the user's trust in the proposed system. This model has been integrated into a mobile application for food recognition and nutrition analysis, offering features like an intelligent diet-tracking system.

CONCLUSION

This research paves the way for practical applications in personalized nutrition and healthcare, showcasing the extensive potential of AI in nutritional sciences across various dietary platforms.

摘要

背景

食物图像识别是计算美食学中的关键步骤，在营养平台上有多种应用。卷积神经网络（CNN）因其能够捕捉分层特征而被广泛用于此任务。然而，它们在处理长距离依赖和全局特征提取方面存在困难，而这对于区分视觉上相似的食物或整个菜肴背景至关重要的图像来说至关重要，因此需要变压器架构。

目的

本研究探索CNN和变压器构建强大分类模型的能力，该模型能够处理具有全局特征的短距离和长距离依赖，以准确分类食物图像并增强食物图像识别，从而进行更好的营养分析。

方法

我们的方法结合了CNN和视觉变压器（ViT），从RestNet50骨干模型开始。该模型负责从输入图像中提取局部特征。然后将生成的特征图传递给ViT编码器块，该块使用多头注意力和具有预训练权重的全连接层处理进一步的全局特征提取和分类。

结果

我们在五个不同数据集上的实验证实，与当前最先进的方法相比，性能更优，并且我们利用互补特征的组合数据集在解决全球食物多样性方面表现出更强的泛化能力和稳健性能。我们使用grad-CAM和LIME等可解释技术来理解模型如何做出决策，从而增强用户对所提出系统的信任。该模型已集成到一个用于食物识别和营养分析的移动应用程序中，提供智能饮食跟踪系统等功能。

结论

本研究为个性化营养和医疗保健的实际应用铺平了道路，展示了人工智能在跨各种饮食平台的营养科学中的广泛潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1393/11768650/8b3aa046424c/nutrients-17-00362-g001.jpg

相似文献

An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition.

Nutrients. 2025 Jan 20;17(2):362. doi: 10.3390/nu17020362.

ResViT FusionNet Model: An explainable AI-driven approach for automated grading of diabetic retinopathy in retinal images.

Comput Biol Med. 2025 Mar;186:109656. doi: 10.1016/j.compbiomed.2025.109656. Epub 2025 Jan 16.

Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.

Curr Med Imaging. 2025;21:e15734056326685. doi: 10.2174/0115734056326685250101113959.

Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.

BMC Med Imaging. 2025 Mar 24;25(1):96. doi: 10.1186/s12880-025-01630-3.

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

HTC-retina: A hybrid retinal diseases classification model using transformer-Convolutional Neural Network from optical coherence tomography images.

Comput Biol Med. 2024 Aug;178:108726. doi: 10.1016/j.compbiomed.2024.108726. Epub 2024 Jun 9.

Convolutional Neural Network-Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification.

Diagnostics (Basel). 2024 Dec 12;14(24):2790. doi: 10.3390/diagnostics14242790.

Enhancing surgical instrument segmentation: integrating vision transformer insights with adapter.

Int J Comput Assist Radiol Surg. 2024 Jul;19(7):1313-1320. doi: 10.1007/s11548-024-03140-z. Epub 2024 May 8.

Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.

Vis Comput Ind Biomed Art. 2025 Jan 8;8(1):1. doi: 10.1186/s42492-024-00181-8.

MSA-MaxNet: Multi-Scale Attention Enhanced Multi-Axis Vision Transformer Network for Medical Image Segmentation.

J Cell Mol Med. 2024 Dec;28(24):e70315. doi: 10.1111/jcmm.70315.

引用本文的文献

Chemical Food Safety in Europe Under the Spotlight: Principles, Regulatory Framework and Roadmap for Future Directions.

Foods. 2025 May 5;14(9):1628. doi: 10.3390/foods14091628.

本文引用的文献

Fine-Grained Food Image Recognition: A Study on Optimising Convolutional Neural Networks for Improved Performance.

J Imaging. 2024 May 22;10(6):126. doi: 10.3390/jimaging10060126.

Applications of Artificial Intelligence, Machine Learning, and Deep Learning in Nutrition: A Systematic Review.

Nutrients. 2024 Apr 6;16(7):1073. doi: 10.3390/nu16071073.

Digital Anti-Aging Healthcare: An Overview of the Applications of Digital Technologies in Diet Management.

J Pers Med. 2024 Feb 27;14(3):254. doi: 10.3390/jpm14030254.

Automatic Chinese Food recognition based on a stacking fusion model.

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340620.

Large Scale Visual Food Recognition.

IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9932-9949. doi: 10.1109/TPAMI.2023.3237871. Epub 2023 Jun 30.

Applying Image-Based Food-Recognition Systems on Dietary Assessment: A Systematic Review.

Adv Nutr. 2022 Dec 22;13(6):2590-2619. doi: 10.1093/advances/nmac078.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于可解释卷积神经网络和视觉Transformer的实时食品识别方法。

An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition.

作者信息

Nfor Kintoh Allen, Theodore Armand Tagne Poupi, Ismaylovna Kenesbaeva Periyzat, Joo Moon-Il, Kim Hee-Cheol

机构信息

Department of Computer Engineering, Inje University, Gimhae 50834, Republic of Korea.

Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae 50834, Republic of Korea.