Larke Jules A, Lemay Danielle G
US Department of Food and Agriculture, Agricultural Research Service Western Human Nutrition Research Center, Davis, California.
US Department of Food and Agriculture, Agricultural Research Service Western Human Nutrition Research Center, Davis, California; Department of Nutrition, University of California, Davis, Davis, California.
J Acad Nutr Diet. 2025 May 28. doi: 10.1016/j.jand.2025.05.012.
Methods for modeling the relationship between self-reported 24-hour dietary recalls and health outcomes are traditionally based on nutrients and/or dietary patterns. Machine learning (ML), combined with hierarchical representations of diet, may help improve estimates of health and identify specific foods associated with diet-induced inflammation.
The aim of this study was to assess the accuracy of estimating systemic inflammation from hierarchically arranged ingredient-level diets in a large US cohort.
This was a cross-sectional analysis using data on US adults from the National Health and Nutrition Examination Survey 2001-2010 and 2015-2018 cycles.
The continuous National Health and Nutrition Examination Survey data representing adult men and women who completed 1 or more 24-hour dietary recalls with an energy intake between 500 and 4500 kcal/day had no active infection or acute phase response, and measurement of serum C-reactive protein (CRP) level (N = 19 460).
The main outcome measure was classification accuracy for predicting high and low inflammation based on the top and bottom tertiles of CRP level.
Mixed meal disaggregation was performed to generate an ingredient-level representation of diet that was further annotated to produce a hierarchical data structure, or food tree. Hierarchical feature engineering selected the most informative food tree features for predicting systemic inflammation (ie, CRP level). ML models were used to assess the accuracy of predicting CRP level from the food tree features compared with the Dietary Inflammatory Index (DII) score. Logistic regression was used to calculate the marginal effects of ingredients identified from ML models.
Representation of diet as an ingredient-level food tree reduced dietary features from 6412 unique foods to 566 unique ingredients. ML classifiers trained on food tree data predicted high vs low systemic inflammation (CRP level tertile) with similar accuracy (0.761) on never-seen data compared with models trained using DII scores (0.757) (McNemar test P = .5). Individual dietary components revealed contributions toward increased inflammation, including fruit punch, soda, and high-fat milk (marginal effects: 0.001 to 0.005; P < .05), and foods associated with decreased inflammation such as herbal tea, brewed espresso, decaf coffee, brown rice, and dry pasta (marginal effects: -0.08 to -0.001; P < .05).
Specific ingredients, selected from a food tree, performed as well as the DII in predicting systemic inflammation. Choice of common foods and beverages associated with inflammation varied in magnitude and direction, consistent with previous studies that have demonstrated pro- and anti-inflammatory responses to these dietary components.
传统上,用于建立自我报告的24小时饮食回忆与健康结果之间关系的方法是基于营养素和/或饮食模式。机器学习(ML)与饮食的分层表示相结合,可能有助于改善对健康的估计,并识别与饮食引起的炎症相关的特定食物。
本研究的目的是评估在美国一个大型队列中,从分层排列的成分水平饮食估计全身炎症的准确性。
这是一项横断面分析,使用了2001 - 2010年和2015 - 2018年周期的美国国家健康与营养检查调查(National Health and Nutrition Examination Survey)中美国成年人的数据。
连续的美国国家健康与营养检查调查数据代表了完成1次或更多次24小时饮食回忆、能量摄入量在500至4500千卡/天之间、无活动性感染或急性期反应且测量了血清C反应蛋白(CRP)水平的成年男性和女性(N = 19460)。
主要结局指标是基于CRP水平的最高和最低三分位数预测高炎症和低炎症的分类准确性。
进行混合餐分解以生成饮食的成分水平表示,进一步注释以生成分层数据结构或食物树。分层特征工程选择最具信息性的食物树特征来预测全身炎症(即CRP水平)。使用ML模型评估从食物树特征预测CRP水平的准确性,并与饮食炎症指数(DII)评分进行比较。使用逻辑回归计算从ML模型中识别出的成分的边际效应。
将饮食表示为成分水平的食物树,将饮食特征从6412种独特食物减少到566种独特成分。在从未见过的数据上,基于食物树数据训练的ML分类器预测高全身炎症与低全身炎症(CRP水平三分位数)的准确性(0.761)与使用DII评分训练的模型(0.757)相似(McNemar检验P = 0.5)。个体饮食成分显示出对炎症增加的贡献,包括水果潘趣酒、苏打水和高脂牛奶(边际效应:0.001至0.005;P < 0.05),以及与炎症减少相关的食物,如花草茶、煮制的浓缩咖啡、脱咖啡因咖啡、糙米和干面食(边际效应:-0.08至-0.001;P < 0.05)。
从食物树中选择的特定成分在预测全身炎症方面与DII表现相当。与炎症相关的常见食物和饮料的选择在程度和方向上各不相同,这与先前证明对这些饮食成分有促炎和抗炎反应的研究一致。