Mezgec Simon, Koroušić Seljak Barbara
Information and Communication Technologies, Jožef Stefan International Postgraduate School, Jamova Cesta 39, 1000 Ljubljana, Slovenia.
Computer Systems Department, Jožef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia.
Nutrients. 2017 Jun 27;9(7):657. doi: 10.3390/nu9070657.
Automatic food image recognition systems are alleviating the process of food-intake estimation and dietary assessment. However, due to the nature of food images, their recognition is a particularly challenging task, which is why traditional approaches in the field have achieved a low classification accuracy. Deep neural networks have outperformed such solutions, and we present a novel approach to the problem of food and drink image detection and recognition that uses a newly-defined deep convolutional neural network architecture, called NutriNet. This architecture was tuned on a recognition dataset containing 225,953 512 × 512 pixel images of 520 different food and drink items from a broad spectrum of food groups, on which we achieved a classification accuracy of 86 . 72 % , along with an accuracy of 94 . 47 % on a detection dataset containing 130 , 517 images. We also performed a real-world test on a dataset of self-acquired images, combined with images from Parkinson's disease patients, all taken using a smartphone camera, achieving a top-five accuracy of 55 % , which is an encouraging result for real-world images. Additionally, we tested NutriNet on the University of Milano-Bicocca 2016 (UNIMIB2016) food image dataset, on which we improved upon the provided baseline recognition result. An online training component was implemented to continually fine-tune the food and drink recognition model on new images. The model is being used in practice as part of a mobile app for the dietary assessment of Parkinson's disease patients.
自动食物图像识别系统正在简化食物摄入量估计和饮食评估的过程。然而,由于食物图像的特性,其识别是一项特别具有挑战性的任务,这就是该领域的传统方法分类准确率较低的原因。深度神经网络已经超越了这类解决方案,我们提出了一种针对食物和饮料图像检测与识别问题的新颖方法,该方法使用了一种新定义的深度卷积神经网络架构,称为NutriNet。此架构在一个识别数据集上进行了调优,该数据集包含来自广泛食物类别的520种不同食物和饮料的225,953张512×512像素图像,我们在该数据集上实现了86.72%的分类准确率,以及在一个包含130,517张图像的检测数据集上实现了94.47%的准确率。我们还在一个自行采集的图像数据集以及帕金森病患者的图像数据集上进行了实际测试,所有图像均使用智能手机摄像头拍摄,实现了55%的前五准确率,这对于真实世界图像来说是一个令人鼓舞的结果。此外,我们在米兰比可卡大学2016年(UNIMIB2016)食物图像数据集上对NutriNet进行了测试,并在提供的基线识别结果基础上有所改进。实现了一个在线训练组件,以便在新图像上持续微调食物和饮料识别模型。该模型在实践中作为一款用于帕金森病患者饮食评估的移动应用程序的一部分被使用。