Santos Roberta de Oliveira, Gorgulho Bartira Mendes, Castro Michelle Alessandra de, Fisberg Regina Mara, Marchioni Dirce Maria, Baltar Valéria Troncoso
Department of Nutrition, School of Public Health, Universidade de São Paulo - São Paulo (SP), Brazil.
Department of Epidemiology and Biostatistics, Collective Health Institute, Universidade Federal Fluminense - Rio de Janeiro (RJ), Brazil.
Rev Bras Epidemiol. 2019 Jul 29;22:e190041. doi: 10.1590/1980-549720190041.
Statistical methods such as Principal Component Analysis (PCA) and Factor Analysis (FA) are increasingly popular in Nutritional Epidemiology studies. However, misunderstandings regarding the choice and application of these methods have been observed.
This study aims to compare and present the main differences and similarities between FA and PCA, focusing on their applicability to nutritional studies.
PCA and FA were applied on a matrix of 34 variables expressing the mean food intake of 1,102 individuals from a population-based study.
Two factors were extracted and, together, they explained 57.66% of the common variance of food group variables, while five components were extracted, explaining 26.25% of the total variance of food group variables. Among the main differences of these two methods are: normality assumption, matrices of variance-covariance/correlation and its explained variance, factorial scores, and associated error. The similarities are: both analyses are used for data reduction, the sample size usually needs to be big, correlated data, and they are based on matrices of variance-covariance.
PCA and FA should not be treated as equal statistical methods, given that the theoretical rationale and assumptions for using these methods as well as the interpretation of results are different.
主成分分析(PCA)和因子分析(FA)等统计方法在营养流行病学研究中越来越受欢迎。然而,人们发现对这些方法的选择和应用存在误解。
本研究旨在比较并呈现因子分析和主成分分析之间的主要差异和相似之处,重点关注它们在营养研究中的适用性。
将主成分分析和因子分析应用于一个包含34个变量的矩阵,这些变量表示来自一项基于人群研究的1102名个体的平均食物摄入量。
提取了两个因子,它们共同解释了食物组变量共同方差的57.66%,同时提取了五个成分,解释了食物组变量总方差的26.25%。这两种方法的主要差异包括:正态性假设、方差协方差/相关矩阵及其解释的方差、因子得分和相关误差。相似之处在于:两种分析都用于数据降维,样本量通常需要较大,数据相关,且都基于方差协方差矩阵。
鉴于使用这些方法的理论依据和假设以及结果解释不同,不应将主成分分析和因子分析视为等同的统计方法。