Fransen Heidi P, May Anne M, Stricker Martin D, Boer Jolanda M A, Hennig Christian, Rosseel Yves, Ocké Marga C, Peeters Petra H M, Beulens Joline W J
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands National Institute for Public Health and the Environment, Bilthoven, The Netherlands.
Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
J Nutr. 2014 Aug;144(8):1274-82. doi: 10.3945/jn.113.188680. Epub 2014 May 28.
Principal component analysis (PCA) and cluster analysis are used frequently to derive dietary patterns. Decisions on how many patterns to extract are primarily based on subjective criteria, whereas different solutions vary in their food-group composition and perhaps association with disease outcome. Literature on reliability of dietary patterns is scarce, and previous studies validated only 1 preselected solution. Therefore, we assessed reliability of different pattern solutions ranging from 2 to 6 patterns, derived from the aforementioned methods. A validated food frequency questionnaire was administered at baseline (1993-1997) to 39,678 participants in the European Prospective Investigation into Cancer and Nutrition-The Netherlands (EPIC-NL) cohort. Food items were grouped into 31 food groups for dietary pattern analysis. The cohort was randomly divided into 2 halves, and dietary pattern solutions derived in 1 sample through PCA were replicated through confirmatory factor analysis in sample 2. For cluster analysis, cluster stability and split-half reproducibility were assessed for various solutions. With PCA, we found the 3-component solution to be best replicated, although all solutions contained ≥1 poorly confirmed component. No quantitative criterion was in agreement with the results. Associations with disease outcome (coronary heart disease) differed between the component solutions. For all cluster solutions, stability was excellent and deviations between samples was negligible, indicating good reproducibility. All quantitative criteria identified the 2-cluster solution as optimal. Associations with disease outcome were comparable for different cluster solutions. In conclusion, reliability of obtained dietary patterns differed considerably for different solutions using PCA, whereas cluster analysis derived generally stable, reproducible clusters across different solutions. Quantitative criteria for determining the number of patterns to retain were valuable for cluster analysis but not for PCA. Associations with disease risk were influenced by the number of patterns that are retained, especially when using PCA. Therefore, studies on associations between dietary patterns and disease risk should report reasons to choose the number of retained patterns.
主成分分析(PCA)和聚类分析经常被用于推导饮食模式。关于提取多少种模式的决策主要基于主观标准,而不同的解决方案在食物组构成以及与疾病结局的关联方面存在差异。关于饮食模式可靠性的文献很少,并且之前的研究仅验证了1种预先选定的解决方案。因此,我们评估了通过上述方法得出的2至6种模式的不同解决方案的可靠性。在基线期(1993 - 1997年),对荷兰欧洲癌症与营养前瞻性调查(EPIC - NL)队列中的39678名参与者进行了一份经过验证的食物频率问卷调查。食物项目被分为31个食物组用于饮食模式分析。该队列被随机分为两半,通过主成分分析在样本1中得出的饮食模式解决方案通过验证性因子分析在样本2中进行复制。对于聚类分析,评估了各种解决方案的聚类稳定性和对半可重复性。通过主成分分析,我们发现三成分解决方案的复制效果最佳,尽管所有解决方案都包含≥1个确认度较差的成分。没有定量标准与结果一致。成分解决方案与疾病结局(冠心病)之间的关联存在差异。对于所有聚类解决方案,稳定性都非常好,样本之间的偏差可以忽略不计,表明具有良好的可重复性。所有定量标准都将两聚类解决方案确定为最优。不同聚类解决方案与疾病结局的关联具有可比性。总之,使用主成分分析时,不同解决方案获得的饮食模式可靠性差异很大,而聚类分析在不同解决方案中得出的聚类通常是稳定且可重复的。用于确定保留模式数量的定量标准对聚类分析有价值,但对主成分分析没有价值。与疾病风险的关联受到保留模式数量的影响,尤其是在使用主成分分析时。因此,关于饮食模式与疾病风险之间关联的研究应报告选择保留模式数量的理由。