Zhao Yaping, Zhu Ping, Jiang Yizhang, Xia Kaijian
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China.
Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Suzhou, Jiangsu, China.
Front Nutr. 2024 Dec 17;11:1469878. doi: 10.3389/fnut.2024.1469878. eCollection 2024.
Nutrition is closely related to body health. A reasonable diet structure not only meets the body's needs for various nutrients but also effectively prevents many chronic diseases. However, due to the general lack of systematic nutritional knowledge, people often find it difficult to accurately assess the nutritional content of food. In this context, image-based nutritional evaluation technology can provide significant assistance. Therefore, we are dedicated to directly predicting the nutritional content of dishes through images. Currently, most related research focuses on estimating the volume or area of food through image segmentation tasks and then calculating its nutritional content based on the food category. However, this method often lacks real nutritional content labels as a reference, making it difficult to ensure the accuracy of the predictions.
To address this issue, we combined segmentation and regression tasks and used the Nutrition5k dataset, which contains detailed nutritional content labels but no segmentation labels, for manual segmentation annotation. Based on these annotated data, we developed a nutritional content prediction model that performs segmentation first and regression afterward. Specifically, we first applied the UNet model to segment the food, then used a backbone network to extract features, and enhanced the feature expression capability through the Squeeze-and-Excitation structure. Finally, the extracted features were processed through several fully connected layers to obtain predictions for the weight, calories, fat, carbohydrates, and protein content.
Our model achieved an outstanding average percentage mean absolute error (PMAE) of 17.06% for these components. All manually annotated segmentation labels can be found at https://doi.org/10.6084/m9.figshare.26252048.v1.
营养与身体健康密切相关。合理的饮食结构不仅能满足身体对各种营养素的需求,还能有效预防多种慢性疾病。然而,由于普遍缺乏系统的营养知识,人们常常难以准确评估食物的营养成分。在此背景下,基于图像的营养评估技术能提供重要帮助。因此,我们致力于通过图像直接预测菜肴的营养成分。目前,大多数相关研究集中于通过图像分割任务估计食物的体积或面积,然后根据食物类别计算其营养成分。然而,这种方法往往缺乏真实的营养成分标签作为参考,难以确保预测的准确性。
为解决此问题,我们将分割和回归任务相结合,并使用了Nutrition5k数据集(该数据集包含详细的营养成分标签,但没有分割标签)进行人工分割标注。基于这些标注数据,我们开发了一个先进行分割后进行回归的营养成分预测模型。具体而言,我们首先应用UNet模型分割食物,然后使用骨干网络提取特征,并通过挤压与激励结构增强特征表达能力。最后,对提取的特征通过几个全连接层进行处理,以获得重量、卡路里、脂肪、碳水化合物和蛋白质含量的预测值。
我们的模型对这些成分的平均百分比平均绝对误差(PMAE)达到了出色的17.06%。所有人工标注的分割标签可在https://doi.org/10.6084/m9.figshare.26252048.v1上找到。