Suppr超能文献

基于 NHANES 数据的机器学习方法预测美国流行小吃中的脂肪酸类别

Machine Learning Approaches for Predicting Fatty Acid Classes in Popular US Snacks Using NHANES Data.

机构信息

Food Science and Biotechnology Program, Department of Human Ecology, College Agriculture, Science and Technology, Delaware State University, 1200 N DuPont Highway, Dover, DE 19901, USA.

Department of Computational Data Science and Engineering, North Carolina Agricultural and Technical State University, 1601 E Market St, Greensboro, NC 27411, USA.

出版信息

Nutrients. 2023 Jul 26;15(15):3310. doi: 10.3390/nu15153310.

Abstract

In the US, people frequently snack between meals, consuming calorie-dense foods including baked goods (cakes), sweets, and desserts (ice cream) high in lipids, salt, and sugar. Monounsaturated fatty acid (MUFA) and polyunsaturated fatty acid (PUFA) are reasonably healthy; however, excessive consumption of food high in saturated fatty acid (SFA) has been related to an elevated risk of cardiovascular diseases. The National Health and Nutrition Survey (NHANES) uses a 24 h recall to collect information on people's food habits in the US. The complexity of the NHANES data necessitates using machine learning (ML) methods, a branch of data science that uses algorithms to collect large, unstructured, and structured data sets and identify correlations between the data variables. This study focused on determining the ability of ML regression models including artificial neural networks (ANNs), decision trees (DTs), k-nearest neighbors (KNNs), and support vector machines (SVMs) to assess the variability in total fat content concerning the classes (SFA, MUFA, and PUFA) of US-consumed snacks between 2017 and 2018. KNNs and DTs predicted SFA, MUFA, and PUFA with mean squared error (MSE) of 0.707, 0.489, 0.612, and 1.172, 0.846, 0.738, respectively. SVMs failed to predict the fatty acids accurately; however, ANNs performed satisfactorily. Using ensemble methods, DTs (10.635, 5.120, 7.075) showed higher error values for MSE than linear regression (LiR) (9.086, 3.698, 5.820) for SFA, MUFA, and PUFA prediction, respectively. R score ranged between -0.541 to 0.983 and 0.390 to 0.751 for models one and two, respectively. Extreme gradient boost (XGR), Light gradient boost (LightGBM), and random forest (RF) performed better than LiR, with RF having the lowest score for MSE in predicting all the fatty acid classes.

摘要

在美国,人们经常在两餐之间吃零食,这些零食包括烘焙食品(蛋糕)、糖果和甜点(冰淇淋),它们都含有高脂肪、高盐和高糖。单不饱和脂肪酸(MUFA)和多不饱和脂肪酸(PUFA)是比较健康的;然而,过量摄入富含饱和脂肪酸(SFA)的食物与心血管疾病风险的增加有关。国家健康和营养调查(NHANES)使用 24 小时回顾法收集美国居民饮食习惯的信息。NHANES 数据的复杂性需要使用机器学习(ML)方法,这是数据科学的一个分支,它使用算法来收集大量的、非结构化的和结构化的数据集,并识别数据变量之间的相关性。本研究集中于确定包括人工神经网络(ANNs)、决策树(DTs)、k-最近邻(KNNs)和支持向量机(SVMs)在内的 ML 回归模型的能力,以评估 2017 年至 2018 年间美国消费零食中总脂肪含量相对于 SFA、MUFA 和 PUFA 类别的变化。KNN 和 DTs 预测 SFA、MUFA 和 PUFA 的均方误差(MSE)分别为 0.707、0.489、0.612 和 1.172、0.846、0.738。SVM 无法准确预测脂肪酸,而 ANN 表现令人满意。使用集成方法,DTs(10.635、5.120、7.075)在 SFA、MUFA 和 PUFA 预测方面的 MSE 误差值高于线性回归(LiR)(9.086、3.698、5.820)。模型一和模型二的 R 分数分别在-0.541 到 0.983 和 0.390 到 0.751 之间。极端梯度提升(XGR)、轻梯度提升(LightGBM)和随机森林(RF)的性能优于 LiR,其中 RF 在预测所有脂肪酸类别时的 MSE 得分最低。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d2/10421424/503bc32900d2/nutrients-15-03310-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验