Zou Zhuoyang, Zhu Xinghui, Zhu Qinying, Zhang Hongyan, Zhu Lei
College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China.
Foods. 2024 May 23;13(11):1628. doi: 10.3390/foods13111628.
As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.
作为食品计算中的一个重要主题,跨模态食谱检索受到了广泛关注。然而,由于现有解决方案中缺乏模态内对齐,食品图像和食谱之间的语义对齐无法进一步增强。此外,一个名为食品图像模糊性的关键问题被忽视了,这干扰了模型的收敛。为此,我们提出了一种用于跨模态食谱检索的新型多模态对齐方法(MMACMR)。为了同时考虑模态间和模态内对齐,该方法在相应食谱的指导下测量模糊食品图像的相似度。此外,我们通过在食材和说明之间引入交叉注意力模块来增强食谱语义表示学习,这有效地支持了食品图像相似度测量。我们在具有挑战性的公共数据集Recipe1M上进行了实验;结果,我们的方法在常用评估标准上优于几种现有的先进方法。