Department of Psychology, Drexel University, Philadelphia, PA, 19104, USA.
Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, PA, 19111, USA.
BMC Med Res Methodol. 2018 Oct 29;18(1):119. doi: 10.1186/s12874-018-0585-8.
Diet plays an important role in chronic disease, and the use of dietary pattern analysis has grown rapidly as a way of deconstructing the complexity of nutritional intake and its relation to health. Pattern analysis methods, such as principal component analysis (PCA), have been used to investigate various dimensions of diet. Existing analytic methods, however, do not fully utilize the predictive potential of dietary assessment data. In particular, these methods are often suboptimal at predicting clinically important variables.
We propose a new dietary pattern analysis method using the advanced LASSO (Least Absolute Shrinkage and Selection Operator) model to improve the prediction of disease-related risk factors. Despite the potential advantages of LASSO, this is the first time that the model has been adapted for dietary pattern analysis. Hence, the systematic evaluation of the LASSO model as applied to dietary data and health outcomes is highly innovative and novel. Using Food Frequency Questionnaire data from NHANES 2005-2006, we apply PCA and LASSO to identify dietary patterns related to cardiovascular disease risk factors in healthy US adults (n = 2609) after controlling for confounding variables (e.g., age and BMI). Both analyses account for the sampling weights. Model performance in terms of prediction accuracy is evaluated using an independent test set.
PCA yields 10 principal components (PCs) that together account for 65% of the variation in the data set and represent distinct dietary patterns. These PCs are then used as predictors in a regression model to predict cardiovascular disease risk factors. We find that LASSO better predicts levels of triglycerides, LDL cholesterol, HDL cholesterol, and total cholesterol (adjusted R = 0.861, 0.899, 0.890, and 0.935 respectively) than does the traditional, linear-regression-based, dietary pattern analysis method (adjusted R = 0.163, 0.005, 0.235, and 0.024 respectively) when the latter is applied to components derived from PCA.
The proposed method is shown to be an appropriate and promising statistical means of deriving dietary patterns predictive of cardiovascular disease risk. Future studies, involving different diseases and risk factors, will be necessary before LASSO's broader usefulness in nutritional epidemiology can be established.
饮食在慢性病中起着重要作用,而饮食模式分析作为一种分解营养摄入复杂性及其与健康关系的方法,其应用迅速发展。主成分分析(PCA)等模式分析方法已被用于研究饮食的各种维度。然而,现有的分析方法并没有充分利用饮食评估数据的预测潜力。特别是,这些方法在预测临床重要变量方面往往不是最佳的。
我们提出了一种使用先进的最小绝对收缩和选择算子(LASSO)模型的新饮食模式分析方法,以提高对与疾病相关的风险因素的预测能力。尽管 LASSO 具有潜在的优势,但这是该模型首次被应用于饮食模式分析。因此,系统评估 LASSO 模型在饮食数据和健康结果中的应用具有高度的创新性和新颖性。我们使用 NHANES 2005-2006 年的食物频率问卷数据,在控制混杂因素(如年龄和 BMI)后,应用 PCA 和 LASSO 来识别与美国健康成年人心血管疾病风险因素相关的饮食模式(n=2609)。这两种分析都考虑了抽样权重。使用独立测试集评估预测准确性的模型性能。
PCA 产生了 10 个主成分(PCs),它们共同解释了数据集 65%的变化,代表了不同的饮食模式。然后,这些 PCs 被用作回归模型的预测因子,以预测心血管疾病的风险因素。我们发现,与传统的基于线性回归的饮食模式分析方法相比(分别调整后的 R²为 0.163、0.005、0.235 和 0.024),LASSO 可以更好地预测甘油三酯、LDL 胆固醇、HDL 胆固醇和总胆固醇的水平(调整后的 R²分别为 0.861、0.899、0.890 和 0.935),当后者应用于 PCA 衍生的成分时。
所提出的方法被证明是一种合适且有前途的统计方法,可以从饮食模式中提取出可预测心血管疾病风险的模式。在 LASSO 在营养流行病学中的广泛应用得到证实之前,还需要进行涉及不同疾病和风险因素的进一步研究。