Liu Chenyu, Zhang Xinlian, Nguyen Tanya T, Liu Jinyuan, Wu Tsungchin, Lee Ellen, Tu Xin M
Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, UC San Diego, La Jolla, California, USA.
Department of Psychiatry, Stein Institute for Research on Aging, UC San Diego, La Jolla, California, USA.
Gen Psychiatr. 2022 Jan 27;35(1):e100662. doi: 10.1136/gpsych-2021-100662. eCollection 2022.
In many statistical applications, composite variables are constructed to reduce the number of variables and improve the performances of statistical analyses of these variables, especially when some of the variables are highly correlated. Principal component analysis (PCA) and factor analysis (FA) are generally used for such purposes. If the variables are used as explanatory or independent variables in linear regression analysis, partial least squares (PLS) regression is a better alternative. Unlike PCA and FA, PLS creates composite variables by also taking into account the response, or dependent variable, so that they have higher correlations with the response than composites from their PCA and FA counterparts. In this report, we provide an introduction to this useful approach and illustrate it with data from a real study.
在许多统计应用中,构建复合变量以减少变量数量并提高这些变量统计分析的性能,特别是当一些变量高度相关时。主成分分析(PCA)和因子分析(FA)通常用于此目的。如果变量在线性回归分析中用作解释变量或自变量,偏最小二乘(PLS)回归是更好的选择。与PCA和FA不同,PLS还通过考虑响应变量或因变量来创建复合变量,因此它们与响应的相关性高于其PCA和FA对应方法得到的复合变量。在本报告中,我们介绍这种有用的方法并用一项实际研究的数据进行说明。