Zhang Fengyu, Wagener Diane
Statistics and Epidemiology Unit, Research Triangle Institute, NC 27709, USA.
J Genet Genomics. 2008 Jun;35(6):381-5. doi: 10.1016/S1673-8527(08)60055-7.
In this study, we propose to use the principal component analysis (PCA) and regression model to incorporate linkage disequilibrium (LD) in genomic association data analysis. To accommodate LD in genomic data and reduce multiple testing, we suggest performing PCA and extracting the PCA score to capture the variation of genomic data, after which regression analysis is used to assess the association of the disease with the principal component score. An empirical analysis result shows that both genotype-based correlation matrix and haplotype-based LD matrix can produce similar results for PCA. Principal component score seems to be more powerful in detecting genetic association because the principal component score is quantitatively measured and may be able to capture the effect of multiple loci.
在本研究中,我们建议使用主成分分析(PCA)和回归模型,将连锁不平衡(LD)纳入基因组关联数据分析。为了在基因组数据中考虑LD并减少多重检验,我们建议进行PCA并提取PCA分数以捕捉基因组数据的变异,之后使用回归分析来评估疾病与主成分分数之间的关联。实证分析结果表明,基于基因型的相关矩阵和基于单倍型的LD矩阵在PCA中均可产生相似的结果。主成分分数在检测基因关联方面似乎更具效力,因为主成分分数是定量测量的,并且可能能够捕捉多个位点的效应。