• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

与传统方法相比,用于食品分类和营养质量预测的自然语言处理和机器学习方法。

Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods.

作者信息

Hu Guanlan, Ahmed Mavra, L'Abbé Mary R

机构信息

Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.

Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Joannah & Brian Lawson Centre for Child Nutrition, University of Toronto, ON, Canada.

出版信息

Am J Clin Nutr. 2023 Mar;117(3):553-563. doi: 10.1016/j.ajcnut.2022.11.022. Epub 2022 Dec 23.

DOI:10.1016/j.ajcnut.2022.11.022
PMID:36872019
Abstract

BACKGROUND

Food categorization and nutrient profiling are labor intensive, time consuming, and costly tasks, given the number of products and labels in large food composition databases and the dynamic food supply.

OBJECTIVES

This study used a pretrained language model and supervised machine learning to automate food category classification and nutrition quality score prediction based on manually coded and validated data, and compared prediction results with models using bag-of-words and structured nutrition facts as inputs for predictions.

METHODS

Food product information from University of Toronto Food Label Information and Price Database 2017 (n = 17,448) and University of Toronto Food Label Information and Price Database 2020 (n = 74,445) databases were used. Health Canada's Table of Reference Amounts (TRA) (24 categories and 172 subcategories) was used for food categorization and the Food Standards of Australia and New Zealand (FSANZ) nutrient profiling system was used for nutrition quality score evaluation. TRA categories and FSANZ scores were manually coded and validated by trained nutrition researchers. A modified pretrained sentence-Bidirectional Encoder Representations from Transformers model was used to encode unstructured text from food labels into lower-dimensional vector representations, followed by supervised machine learning algorithms (i.e., elastic net, k-Nearest Neighbors, and XGBoost) for multiclass classification and regression tasks.

RESULTS

Pretrained language model representations utilized by the XGBoost multiclass classification algorithm reached overall accuracy scores of 0.98 and 0.96 in predicting food TRA major and subcategories, outperforming bag-of-words methods. For FSANZ score prediction, our proposed method reached a similar prediction accuracy (R: 0.87 and MSE: 14.4) compared with bag-of-words methods (R: 0.72-0.84; MSE: 30.3-17.6), whereas structured nutrition facts machine learning model performed the best (R: 0.98; MSE: 2.5). The pretrained language model had a higher generalizable ability on the external test datasets than bag-of-words methods.

CONCLUSIONS

Our automation achieved high accuracy in classifying food categories and predicting nutrition quality scores using text information found on food labels. This approach is effective and generalizable in a dynamic food environment, where large amounts of food label data can be obtained from websites.

摘要

背景

鉴于大型食品成分数据库中的产品数量和标签数量以及动态的食品供应情况,食品分类和营养成分分析是劳动密集型、耗时且成本高昂的任务。

目的

本研究使用预训练语言模型和监督机器学习,基于人工编码和验证的数据实现食品类别分类和营养质量得分预测自动化,并将预测结果与使用词袋模型和结构化营养成分信息作为预测输入的模型进行比较。

方法

使用了来自多伦多大学2017年食品标签信息与价格数据库(n = 17,448)和多伦多大学2020年食品标签信息与价格数据库(n = 74,445)的食品产品信息。加拿大卫生部的参考摄入量表(TRA)(24个类别和172个子类别)用于食品分类,澳大利亚和新西兰食品标准(FSANZ)营养成分分析系统用于营养质量得分评估。TRA类别和FSANZ得分由训练有素的营养研究人员进行人工编码和验证。使用经过修改的预训练的基于变换器的句子双向编码器表征模型,将食品标签中的非结构化文本编码为低维向量表征,随后使用监督机器学习算法(即弹性网络、k近邻和极端梯度提升)进行多类分类和回归任务。

结果

极端梯度提升多类分类算法使用的预训练语言模型表征在预测食品TRA主要类别和子类别时,总体准确率分别达到0.98和0.96,优于词袋模型方法。对于FSANZ得分预测,与词袋模型方法(R:0.72 - 0.84;均方误差:30.3 - 17.6)相比,我们提出的方法达到了相似的预测准确率(R:0.87;均方误差:14.4),而结构化营养成分信息机器学习模型表现最佳(R:0.98;均方误差:2.5)。预训练语言模型在外部测试数据集上比词袋模型方法具有更高的泛化能力。

结论

我们的自动化方法在使用食品标签上的文本信息对食品类别进行分类和预测营养质量得分方面取得了高精度。这种方法在动态食品环境中是有效且可推广的,在该环境中可以从网站获取大量食品标签数据。

相似文献

1
Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods.与传统方法相比,用于食品分类和营养质量预测的自然语言处理和机器学习方法。
Am J Clin Nutr. 2023 Mar;117(3):553-563. doi: 10.1016/j.ajcnut.2022.11.022. Epub 2022 Dec 23.
2
Accelerating the Classification of NOVA Food Processing Levels Using a Fine-Tuned Language Model: A Multi-Country Study.利用微调语言模型加速 NOVA 食品加工水平分类:一项多国家研究。
Nutrients. 2023 Sep 27;15(19):4167. doi: 10.3390/nu15194167.
3
When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.当 BERT 遇见比尔博:预训练语言模型在疾病分类上的学习曲线分析。
BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.
4
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
5
Natural Language Processing for Imaging Protocol Assignment: Machine Learning for Multiclass Classification of Abdominal CT Protocols Using Indication Text Data.基于自然语言处理的成像协议分配:使用指示文本数据进行多类分类的腹部 CT 协议的机器学习。
J Digit Imaging. 2022 Oct;35(5):1120-1130. doi: 10.1007/s10278-022-00633-8. Epub 2022 Jun 2.
6
Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.追踪全球卫生共同财资金:使用自然语言处理技术的机器学习方法。
Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.
7
Development of the Food Label Information Program: A Comprehensive Canadian Branded Food Composition Database.食品标签信息计划的发展:一个全面的加拿大品牌食品成分数据库。
Front Nutr. 2022 Feb 3;8:825050. doi: 10.3389/fnut.2021.825050. eCollection 2021.
8
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
9
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
10
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

引用本文的文献

1
Demographic and Physical Determinants of Unhealthy Food Consumption in Polish Long-Term Care Facilities.波兰长期护理机构中不健康食品消费的人口统计学和身体因素
Nutrients. 2025 Mar 13;17(6):1008. doi: 10.3390/nu17061008.
2
NutriRAG: Unleashing the Power of Large Language Models for Food Identification and Classification through Retrieval Methods.NutriRAG:通过检索方法释放大语言模型在食物识别和分类方面的强大功能。
medRxiv. 2025 Mar 20:2025.03.19.25324268. doi: 10.1101/2025.03.19.25324268.
3
Prevalence of processed foods in major US grocery stores.
美国主要杂货店中加工食品的流行情况。
Nat Food. 2025 Mar;6(3):296-308. doi: 10.1038/s43016-024-01095-7. Epub 2025 Jan 13.
4
Nutritional intelligence in the food system: Combining food, health, data and AI expertise.食品系统中的营养智能:融合食品、健康、数据和人工智能专业知识。
Nutr Bull. 2025 Mar;50(1):142-150. doi: 10.1111/nbu.12729. Epub 2025 Jan 12.
5
Investigating the Association between Nutrient Intake and Food Insecurity among Children and Adolescents in Palestine Using Machine Learning Techniques.利用机器学习技术调查巴勒斯坦儿童和青少年营养摄入与粮食不安全之间的关联。
Children (Basel). 2024 May 23;11(6):625. doi: 10.3390/children11060625.
6
GroceryDB: Prevalence of Processed Food in Grocery Stores.杂货店数据库:杂货店中加工食品的流行情况。
medRxiv. 2025 Jan 16:2022.04.23.22274217. doi: 10.1101/2022.04.23.22274217.
7
A Conceptual Study on Characterizing the Complexity of Nutritional Interventions for Malnourished Older Adults in Hospital Settings: An Umbrella Review Approach.一项关于描述医院环境中营养不良老年人营养干预复杂性的概念性研究:一种系统综述方法。
Healthcare (Basel). 2024 Mar 31;12(7):765. doi: 10.3390/healthcare12070765.
8
Accelerating the Classification of NOVA Food Processing Levels Using a Fine-Tuned Language Model: A Multi-Country Study.利用微调语言模型加速 NOVA 食品加工水平分类:一项多国家研究。
Nutrients. 2023 Sep 27;15(19):4167. doi: 10.3390/nu15194167.