Farinella Giovanni Maria, Allegra Dario, Moltisanti Marco, Stanco Filippo, Battiato Sebastiano
Dipartimento di Matematica e Informatica, Viale A. Doria 6, 95125 Catania, Italy.
Comput Biol Med. 2016 Oct 1;77:23-39. doi: 10.1016/j.compbiomed.2016.07.006. Epub 2016 Jul 13.
Automatic food understanding from images is an interesting challenge with applications in different domains. In particular, food intake monitoring is becoming more and more important because of the key role that it plays in health and market economies. In this paper, we address the study of food image processing from the perspective of Computer Vision. As first contribution we present a survey of the studies in the context of food image processing from the early attempts to the current state-of-the-art methods. Since retrieval and classification engines able to work on food images are required to build automatic systems for diet monitoring (e.g., to be embedded in wearable cameras), we focus our attention on the aspect of the representation of the food images because it plays a fundamental role in the understanding engines. The food retrieval and classification is a challenging task since the food presents high variableness and an intrinsic deformability. To properly study the peculiarities of different image representations we propose the UNICT-FD1200 dataset. It was composed of 4754 food images of 1200 distinct dishes acquired during real meals. Each food plate is acquired multiple times and the overall dataset presents both geometric and photometric variabilities. The images of the dataset have been manually labeled considering 8 categories: Appetizer, Main Course, Second Course, Single Course, Side Dish, Dessert, Breakfast, Fruit. We have performed tests employing different representations of the state-of-the-art to assess the related performances on the UNICT-FD1200 dataset. Finally, we propose a new representation based on the perceptual concept of Anti-Textons which is able to encode spatial information between Textons outperforming other representations in the context of food retrieval and Classification.
从图像中自动理解食物是一个有趣的挑战,在不同领域都有应用。特别是,食物摄入量监测正变得越来越重要,因为它在健康和市场经济中发挥着关键作用。在本文中,我们从计算机视觉的角度探讨食物图像处理的研究。作为第一项贡献,我们对食物图像处理领域的研究进行了综述,涵盖了从早期尝试到当前最先进方法的发展历程。由于构建饮食监测自动系统(例如,嵌入可穿戴相机中)需要能够处理食物图像的检索和分类引擎,我们将注意力集中在食物图像表示方面,因为它在理解引擎中起着基础性作用。食物检索和分类是一项具有挑战性的任务,因为食物具有高度的多样性和内在的可变形性。为了恰当地研究不同图像表示的特性,我们提出了UNICT - FD1200数据集。它由在真实用餐过程中获取的1200种不同菜肴的4754张食物图像组成。每个食物盘都被多次获取,整个数据集呈现出几何和光度方面的变化。数据集中的图像已根据8个类别进行了手动标注:开胃菜、主菜、副菜、单道菜、配菜、甜点、早餐、水果。我们使用了不同的最先进表示方法进行测试,以评估在UNICT - FD1200数据集上的相关性能。最后,我们基于反纹理的感知概念提出了一种新的表示方法,它能够编码纹理元素之间的空间信息,在食物检索和分类方面优于其他表示方法。