Suppr超能文献

基于食物图像的菜肴营养成分二维预测:深度学习算法选择及超越Nutrition5k项目的数据处理

2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project.

作者信息

Bianco Rachele, Coluccia Sergio, Marinoni Michela, Falcon Alex, Fiori Federica, Serra Giuseppe, Ferraroni Monica, Edefonti Valeria, Parpinel Maria

机构信息

Department of Medicine-DMED, Università degli Studi di Udine, 33100 Udine, Italy.

Branch of Medical Statistics, Biometry and Epidemiology "G. A. Maccacaro", Department of Clinical Sciences and Community Health, Dipartimento di Eccellenza 2023-2027, Università degli Studi di Milano, 20133 Milano, Italy.

出版信息

Nutrients. 2025 Jun 30;17(13):2196. doi: 10.3390/nu17132196.

Abstract

: Deep learning (DL) has shown strong potential in analyzing food images, but few studies have directly predicted mass, energy, and macronutrient content from images. In addition to the importance of high-quality data, differences in country-specific food composition databases (FCDBs) can hinder model generalization. : We assessed the performance of several standard DL models using four ground truth datasets derived from Nutrition5k-the largest image-nutrition dataset with ~5000 complex US cafeteria dishes. In light of developing an Italian dietary assessment tool, these datasets varied by FCDB alignment (Italian vs. US) and data curation (ingredient-mass correction and frame filtering on the test set). We evaluated combinations of four feature extractors [ResNet-50 (R50), ResNet-101 (R101), InceptionV3 (IncV3), and Vision Transformer-B-16 (ViT-B-16)] with two regression networks (2+1 and 2+2), using IncV3_2+2 as the benchmark. Descriptive statistics (percentages of agreement, unweighted Cohen's kappa, and Bland-Altman plots) and standard regression metrics were used to compare predicted and ground truth nutritional composition. Dishes mispredicted by ≥7 algorithms were analyzed separately. : R50, R101, and ViT-B-16 consistently outperformed the benchmark across all datasets. Specifically, when replacing it with these top algorithms, reductions in median Mean Absolute Percentage Errors were 6.2% for mass, 6.4% for energy, 12.3% for fat, and 33.1% and 40.2% for protein and carbohydrates. Ingredient-mass correction substantially improved prediction metrics (6-42% when considering the top algorithms), while frame filtering had a more limited effect (<3%). Performance was consistently poor across most models for complex salads, chicken-based or eggs-based dishes, and Western-inspired breakfasts. : The R101 and ViT-B-16 architectures will be prioritized in future analyses, where ingredient-mass correction and automated frame filtering methods will be considered.

摘要

深度学习(DL)在分析食物图像方面已显示出强大潜力,但很少有研究直接从图像预测质量、能量和宏量营养素含量。除了高质量数据的重要性外,特定国家食物成分数据库(FCDB)的差异可能会阻碍模型的泛化能力。

我们使用从Nutrition5k衍生的四个地面真值数据集评估了几种标准DL模型的性能,Nutrition5k是最大的图像营养数据集,包含约5000种复杂的美国自助餐厅菜肴。鉴于要开发一种意大利饮食评估工具,这些数据集因FCDB对齐方式(意大利与美国)和数据整理(成分质量校正和测试集上的帧过滤)而有所不同。我们评估了四种特征提取器[ResNet-50(R50)、ResNet-101(R101)、InceptionV3(IncV3)和Vision Transformer-B-16(ViT-B-16)]与两个回归网络(2+1和2+2)的组合,并将IncV3_2+2作为基准。使用描述性统计(一致性百分比、未加权科恩kappa系数和布兰德-奥特曼图)和标准回归指标来比较预测的和地面真值的营养成分。对被≥7种算法错误预测的菜肴进行单独分析。

R50、R101和ViT-B-16在所有数据集中始终优于基准。具体而言,当用这些顶级算法替换它时,质量的中位数平均绝对百分比误差降低了6.2%,能量降低了6.4%,脂肪降低了12.3%,蛋白质和碳水化合物分别降低了33.1%和40.2%。成分质量校正显著改善了预测指标(考虑顶级算法时为6-42%),而帧过滤的效果更有限(<3%)。对于复杂沙拉、鸡肉或鸡蛋为主的菜肴以及西式早餐,大多数模型的性能一直很差。

在未来的分析中,将优先考虑R101和ViT-B-16架构,届时将考虑成分质量校正和自动帧过滤方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44c4/12252204/0754a98a32d0/nutrients-17-02196-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验