Wang Xiaojing, Kammerer Candace M, Anderson Stewart, Lu Jiang, Feingold Eleanor
Department of Oral Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
Genet Epidemiol. 2009 May;33(4):325-31. doi: 10.1002/gepi.20384.
Principal component analysis (PCA) and factor analysis (FA) are often used to uncover genetic factors that contribute to complex disease phenotypes. The purpose of such an analysis is to distill a genetic signal from a large number of correlated phenotype measurements. That signal can then be used in genetic analyses (e.g. linkage analysis), presumably leading to greater success at finding genes than one would achieve with any one raw trait. Although both PCA and FA have been used this way, there has been no comparison of their performance in the literature. We compared the ability of these two procedures to extract unobserved underlying genetic components from complex simulated data on nuclear families. We first simulated seven underlying genetic and environmentally determined traits. Then we derived two sets of 50 complex (observed) traits using algebraic combinations of the underlying components. We next performed PCA and FA on the complex traits. We assessed two aspects of the performance of the methods: (1) ability to detect the underlying genetic components; (2) whether the methods worked better when applied to raw traits or to residuals (after regressing out significant environmental covariates). Our results indicate that both the methods behave similarly in most cases, although FA generally produced factors that had stronger correlations with the underlying traits. We also found that using residuals in PCA or FA analyses greatly increased the probability that the PCs or factors detected common genetic components instead of common environmental factors, except if there was statistical interaction between genetic and environmental factors.
主成分分析(PCA)和因子分析(FA)常被用于揭示导致复杂疾病表型的遗传因素。此类分析的目的是从大量相关的表型测量中提炼出遗传信号。然后,该信号可用于遗传分析(如连锁分析),据推测,这比仅使用任何一个原始性状在寻找基因方面更有可能取得成功。尽管PCA和FA都已被如此使用,但文献中尚未对它们的性能进行比较。我们比较了这两种方法从核心家庭的复杂模拟数据中提取未观察到的潜在遗传成分的能力。我们首先模拟了七个潜在的遗传和环境决定性状。然后,我们使用潜在成分的代数组合得出两组50个复杂(观察到的)性状。接下来,我们对复杂性状进行了PCA和FA。我们评估了这些方法性能的两个方面:(1)检测潜在遗传成分的能力;(2)应用于原始性状还是残差(在剔除显著的环境协变量后)时方法效果更好。我们的结果表明,在大多数情况下,这两种方法的表现相似,尽管FA通常产生与潜在性状相关性更强的因子。我们还发现,在PCA或FA分析中使用残差极大地增加了主成分或因子检测到共同遗传成分而非共同环境因素的概率,除非遗传和环境因素之间存在统计相互作用。