• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

消除歧义与对齐:一种用于跨模态食谱检索的有效多模态对齐方法。

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval.

作者信息

Zou Zhuoyang, Zhu Xinghui, Zhu Qinying, Zhang Hongyan, Zhu Lei

机构信息

College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China.

出版信息

Foods. 2024 May 23;13(11):1628. doi: 10.3390/foods13111628.

DOI:10.3390/foods13111628
PMID:38890857
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11172226/
Abstract

As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.

摘要

作为食品计算中的一个重要主题,跨模态食谱检索受到了广泛关注。然而,由于现有解决方案中缺乏模态内对齐,食品图像和食谱之间的语义对齐无法进一步增强。此外,一个名为食品图像模糊性的关键问题被忽视了,这干扰了模型的收敛。为此,我们提出了一种用于跨模态食谱检索的新型多模态对齐方法(MMACMR)。为了同时考虑模态间和模态内对齐,该方法在相应食谱的指导下测量模糊食品图像的相似度。此外,我们通过在食材和说明之间引入交叉注意力模块来增强食谱语义表示学习,这有效地支持了食品图像相似度测量。我们在具有挑战性的公共数据集Recipe1M上进行了实验;结果,我们的方法在常用评估标准上优于几种现有的先进方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/5e4cb2514f2b/foods-13-01628-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/43f5c978700b/foods-13-01628-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/695b6123c78c/foods-13-01628-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/0e00f021cd85/foods-13-01628-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/8cceba4f5c7a/foods-13-01628-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/5e4cb2514f2b/foods-13-01628-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/43f5c978700b/foods-13-01628-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/695b6123c78c/foods-13-01628-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/0e00f021cd85/foods-13-01628-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/8cceba4f5c7a/foods-13-01628-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a298/11172226/5e4cb2514f2b/foods-13-01628-g005.jpg

相似文献

1
Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval.消除歧义与对齐:一种用于跨模态食谱检索的有效多模态对齐方法。
Foods. 2024 May 23;13(11):1628. doi: 10.3390/foods13111628.
2
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
3
Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images.食谱1M+:用于学习烹饪食谱和食物图像跨模态嵌入的数据集。
IEEE Trans Pattern Anal Mach Intell. 2019 Jul 9. doi: 10.1109/TPAMI.2019.2927476.
4
Learning Structural Representations for Recipe Generation and Food Retrieval.用于食谱生成和食物检索的结构表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3363-3377. doi: 10.1109/TPAMI.2022.3181294. Epub 2023 Feb 3.
5
Ki-Cook: clustering multimodal cooking representations through knowledge-infused learning.Ki-Cook:通过知识注入学习对多模态烹饪表示进行聚类
Front Big Data. 2023 Jul 24;6:1200840. doi: 10.3389/fdata.2023.1200840. eCollection 2023.
6
Improvement of deep cross-modal retrieval by generating real-valued representation.通过生成实值表示改进深度跨模态检索。
PeerJ Comput Sci. 2021 Apr 27;7:e491. doi: 10.7717/peerj-cs.491. eCollection 2021.
7
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.用于零样本基于草图的图像检索的渐进式跨模态语义网络
IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.
8
A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval.一种用于跨模态遥感图像检索的、特定于聚合多尺度信息的细粒度语义对齐方法。
Sensors (Basel). 2023 Oct 13;23(20):8437. doi: 10.3390/s23208437.
9
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
10
Structure-aware contrastive hashing for unsupervised cross-modal retrieval.用于无监督跨模态检索的结构感知对比哈希
Neural Netw. 2024 Jun;174:106211. doi: 10.1016/j.neunet.2024.106211. Epub 2024 Feb 27.

引用本文的文献

1
Applications of Artificial Intelligence in Food Industry.人工智能在食品工业中的应用。
Foods. 2025 Apr 1;14(7):1241. doi: 10.3390/foods14071241.

本文引用的文献

1
Convolution-Enhanced Bi-Branch Adaptive Transformer With Cross-Task Interaction for Food Category and Ingredient Recognition.卷积增强双分支自适应转换器,具有跨任务交互作用,用于食品类别和成分识别。
IEEE Trans Image Process. 2024;33:2572-2586. doi: 10.1109/TIP.2024.3374211. Epub 2024 Apr 1.
2
Fast Nondestructive Detection Technology and Equipment for Food Quality and Safety.食品质量安全快速无损检测技术与设备
Foods. 2023 Oct 12;12(20):3744. doi: 10.3390/foods12203744.
3
Recent developments and applications of surface enhanced Raman scattering spectroscopy in safety detection of fruits and vegetables.
表面增强拉曼散射光谱学在果蔬安全检测中的最新进展和应用。
Food Chem. 2024 Feb 15;434:137469. doi: 10.1016/j.foodchem.2023.137469. Epub 2023 Sep 14.
4
Large Scale Visual Food Recognition.大规模视觉食物识别。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9932-9949. doi: 10.1109/TPAMI.2023.3237871. Epub 2023 Jun 30.
5
Learning Structural Representations for Recipe Generation and Food Retrieval.用于食谱生成和食物检索的结构表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3363-3377. doi: 10.1109/TPAMI.2022.3181294. Epub 2023 Feb 3.
6
Applications of knowledge graphs for food science and industry.知识图谱在食品科学与工业中的应用。
Patterns (N Y). 2022 May 13;3(5):100484. doi: 10.1016/j.patter.2022.100484.