Zhang Zhongheng, Castelló Adela
Department of Emergency Medicine, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310016, China.
Cancer Epidemiology Unit, National Center for Epidemiology, Carlos III Institute of Health, Madrid 28029, Spain.
Ann Transl Med. 2017 Sep;5(17):351. doi: 10.21037/atm.2017.07.12.
In multivariate analysis, independent variables are usually correlated to each other which can introduce multicollinearity in the regression models. One approach to solve this problem is to apply principal components analysis (PCA) over these variables. This method uses orthogonal transformation to represent sets of potentially correlated variables with principal components (PC) that are linearly uncorrelated. PCs are ordered so that the first PC has the largest possible variance and only some components are selected to represent the correlated variables. As a result, the dimension of the variable space is reduced. This tutorial illustrates how to perform PCA in R environment, the example is a simulated dataset in which two PCs are responsible for the majority of the variance in the data. Furthermore, the visualization of PCA is highlighted.
在多变量分析中,自变量通常相互关联,这可能会在回归模型中引入多重共线性。解决此问题的一种方法是对这些变量应用主成分分析(PCA)。该方法使用正交变换,用线性不相关的主成分(PC)来表示潜在相关的变量集。PC按顺序排列,使得第一个PC具有最大可能的方差,并且只选择一些成分来表示相关变量。结果,变量空间的维度降低了。本教程说明了如何在R环境中执行PCA,示例是一个模拟数据集,其中两个PC负责数据中的大部分方差。此外,还重点介绍了PCA的可视化。