• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

卷积增强双分支自适应转换器,具有跨任务交互作用,用于食品类别和成分识别。

Convolution-Enhanced Bi-Branch Adaptive Transformer With Cross-Task Interaction for Food Category and Ingredient Recognition.

出版信息

IEEE Trans Image Process. 2024;33:2572-2586. doi: 10.1109/TIP.2024.3374211. Epub 2024 Apr 1.

DOI:10.1109/TIP.2024.3374211
PMID:38470580
Abstract

Recently, visual food analysis has received more and more attention in the computer vision community due to its wide application scenarios, e.g., diet nutrition management, smart restaurant, and personalized diet recommendation. Considering that food images are unstructured images with complex and unfixed visual patterns, mining food-related semantic-aware regions is crucial. Furthermore, the ingredients contained in food images are semantically related to each other due to the cooking habits and have significant semantic relationships with food categories under the hierarchical food classification ontology. Therefore, modeling the long-range semantic relationships between ingredients and the categories-ingredients semantic interactions is beneficial for ingredient recognition and food analysis. Taking these factors into consideration, we propose a multi-task learning framework for food category and ingredient recognition. This framework mainly consists of a food-orient Transformer named Convolution-Enhanced Bi-Branch Adaptive Transformer (CBiAFormer) and a multi-task category-ingredient recognition network called Structural Learning and Cross-Task Interaction (SLCI). In order to capture the complex and unfixed fine-grained patterns of food images, we propose a query-aware data-adaptive attention mechanism called Bi-Branch Adaptive Attention (BiA-Attention) in CBiAFormer, which consists of a local fine-grained branch and a global coarse-grained branch to mine local and global semantic-aware regions for different input images through an adaptive candidate key/value sets assignment for each query. Additionally, a convolutional patch embedding module is proposed to extract the fine-grained features which are neglected by Transformers. To fully utilize the ingredient information, we propose SLCI, which consists of cross-layer attention to model the semantic relationships between ingredients and two cross-task interaction modules to mine the semantic interactions between categories and ingredients. Extensive experiments show that our method achieves competitive performance on three mainstream food datasets (ETH Food-101, Vireo Food-172, and ISIA Food-200). Visualization analyses of CBiAFormer and SLCI on two tasks prove the effectiveness of our method. Codes will be released upon publication. Code and models are available at https://github.com/Liuyuxinict/CBiAFormer.

摘要

最近,由于其广泛的应用场景,例如饮食营养管理、智能餐厅和个性化饮食推荐,视觉食品分析在计算机视觉领域受到了越来越多的关注。考虑到食品图像是具有复杂和不固定视觉模式的非结构化图像,挖掘与食品相关的语义感知区域至关重要。此外,由于烹饪习惯,食品图像中包含的成分在语义上是相互关联的,并且在分层食品分类本体下与食品类别具有显著的语义关系。因此,对成分和类别-成分语义交互之间的长距离语义关系进行建模有利于成分识别和食品分析。考虑到这些因素,我们提出了一种用于食品类别和成分识别的多任务学习框架。该框架主要由一个名为 Convolution-Enhanced Bi-Branch Adaptive Transformer(CBiAFormer)的食品定向 Transformer 和一个名为 Structural Learning and Cross-Task Interaction(SLCI)的多任务类别-成分识别网络组成。为了捕获食品图像的复杂和不固定的细粒度模式,我们在 CBiAFormer 中提出了一种称为 Bi-Branch Adaptive Attention(BiA-Attention)的查询感知数据自适应注意机制,它由一个局部细粒度分支和一个全局粗粒度分支组成,通过为每个查询自适应分配候选键/值对来挖掘不同输入图像的局部和全局语义感知区域。此外,还提出了卷积补丁嵌入模块来提取 Transformer 忽略的细粒度特征。为了充分利用成分信息,我们提出了 SLCI,它由跨层注意力组成,用于建模成分之间的语义关系,以及两个跨任务交互模块,用于挖掘类别和成分之间的语义交互。广泛的实验表明,我们的方法在三个主流食品数据集(ETH Food-101、Vireo Food-172 和 ISIA Food-200)上取得了有竞争力的性能。在两个任务上对 CBiAFormer 和 SLCI 的可视化分析证明了我们方法的有效性。代码将在发表后公布。代码和模型可在 https://github.com/Liuyuxinict/CBiAFormer 上获得。

相似文献

1
Convolution-Enhanced Bi-Branch Adaptive Transformer With Cross-Task Interaction for Food Category and Ingredient Recognition.卷积增强双分支自适应转换器,具有跨任务交互作用,用于食品类别和成分识别。
IEEE Trans Image Process. 2024;33:2572-2586. doi: 10.1109/TIP.2024.3374211. Epub 2024 Apr 1.
2
Ingredient-Guided Region Discovery and Relationship Modeling for Food Category-Ingredient Prediction.基于成分引导的区域发现和关系建模的食物类目-成分预测。
IEEE Trans Image Process. 2022;31:5214-5226. doi: 10.1109/TIP.2022.3193763. Epub 2022 Aug 4.
3
Ingredient Prediction via Context Learning Network With Class-Adaptive Asymmetric Loss.基于类自适应非对称损失的上下文学习网络进行成分预测
IEEE Trans Image Process. 2023;32:5509-5523. doi: 10.1109/TIP.2023.3318958. Epub 2023 Oct 5.
4
TGDAUNet: Transformer and GCNN based dual-branch attention UNet for medical image segmentation.TGDAUNet:基于 Transformer 和 GCNN 的双分支注意力 U-Net 用于医学图像分割。
Comput Biol Med. 2023 Dec;167:107583. doi: 10.1016/j.compbiomed.2023.107583. Epub 2023 Oct 21.
5
Transformer guided self-adaptive network for multi-scale skin lesion image segmentation.Transformer 引导的自适网络用于多尺度皮肤病变图像分割。
Comput Biol Med. 2024 Feb;169:107846. doi: 10.1016/j.compbiomed.2023.107846. Epub 2023 Dec 23.
6
Multi-Stage Network With Geometric Semantic Attention for Two-View Correspondence Learning.用于双视图对应学习的具有几何语义注意力的多阶段网络
IEEE Trans Image Process. 2024;33:3031-3046. doi: 10.1109/TIP.2024.3391002. Epub 2024 Apr 30.
7
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
8
Fine-Grained Recognition With Learnable Semantic Data Augmentation.基于可学习语义数据增强的细粒度识别
IEEE Trans Image Process. 2024;33:3130-3144. doi: 10.1109/TIP.2024.3364500. Epub 2024 Apr 30.
9
MSCT-UNET: multi-scale contrastive transformer within U-shaped network for medical image segmentation.MSCT-UNET:U 形网络中的多尺度对比变换用于医学图像分割。
Phys Med Biol. 2023 Dec 28;69(1). doi: 10.1088/1361-6560/ad135d.
10
Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning.基于属性特定嵌入学习的细粒度时尚相似度预测
IEEE Trans Image Process. 2021;30:8410-8425. doi: 10.1109/TIP.2021.3115658. Epub 2021 Oct 7.

引用本文的文献

1
Food Image Recognition Based on Anti-Noise Learning and Covariance Feature Enhancement.基于抗噪声学习与协方差特征增强的食品图像识别
Foods. 2025 Aug 9;14(16):2776. doi: 10.3390/foods14162776.
2
2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project.基于食物图像的菜肴营养成分二维预测:深度学习算法选择及超越Nutrition5k项目的数据处理
Nutrients. 2025 Jun 30;17(13):2196. doi: 10.3390/nu17132196.
3
FoodSky: A food-oriented large language model that can pass the chef and dietetic examinations.
FoodSky:一个能够通过厨师和营养师考试的面向食物的大语言模型。
Patterns (N Y). 2025 Apr 22;6(5):101234. doi: 10.1016/j.patter.2025.101234. eCollection 2025 May 9.
4
Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval.消除歧义与对齐:一种用于跨模态食谱检索的有效多模态对齐方法。
Foods. 2024 May 23;13(11):1628. doi: 10.3390/foods13111628.