Department of Medicine-DMED, Università degli Studi di Udine, 33100 Udine, Italy.
Branch of Medical Statistics, Biometry and Epidemiology "G. A. Maccacaro", Department of Clinical Sciences and Community Health, Dipartimento di Eccellenza 2023-2027, Università degli Studi di Milano, 20133 Milan, Italy.
Nutrients. 2024 Oct 1;16(19):3339. doi: 10.3390/nu16193339.
Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants.
After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models.
Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: -2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs.
In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.
在其他国家收集的菜肴图像上训练机器学习算法需要解决可能存在的系统差异来源,包括特定国家的食品成分数据库(FCDB)。美国 Nutrition5k 项目提供了约 5000 张菜肴图像以及来自美国 FCDB 的有关菜肴和成分水平的质量、能量和宏量营养素的相关信息。本研究的目的是:(1)确定将意大利食品的营养成分与 Nutrition5k 的菜肴图像进行链接所面临的挑战/解决方案;(2)评估意大利和美国 FCDB 之间估计的营养素含量的潜在差异及其决定因素。
在进行食物匹配、专家数据整理和处理缺失值后,Nutrition5k 的菜肴级成分与特定于意大利的 FCDB 的营养成分(86 个成分)进行了整合;通过将相应的成分特定的营养值相加,计算出菜肴特定的营养素含量。计算了意大利-FCDB 特定的能量和宏量营养素含量与美国-FCDB 特定的含量之间的一致性/差异度量。使用多个稳健回归模型研究了确定差异的潜在决定因素。
菜肴的平均质量为 145 克,平均包含三种成分。能量、蛋白质、脂肪和碳水化合物的意大利-FCDB 特定含量和美国-FCDB 特定含量之间具有中度至高度一致性;碳水化合物的表现最差,意大利 FCDB 提供的中位数较小(意大利 FCDB 和美国 FCDB 之间的原始中位数差异:-2.10 克)。对菜肴的回归模型表明,单独或联合使用两种 FCDB 之间的生/熟成分的差异、质量、成分数量和再创造食谱的存在,对食物图像识别的机器学习方法具有一定的作用。
在基于机器学习的食物图像识别时代,FCDB 的手动数据整理是值得的。