Queensland Institute of Medical Research, Brisbane, QLD 4029, Australia.
Bioinformatics. 2012 Mar 15;28(6):845-50. doi: 10.1093/bioinformatics/bts051. Epub 2012 Jan 31.
Canonical correlation analysis (CCA) measures the association between two sets of multidimensional variables. We reasoned that CCA could provide an efficient and powerful approach for both univariate and multivariate gene-based tests of association without the need for permutation testing.
Compared with a commonly used permutation-based approach, CCA (i) is faster; (ii) has appropriate type-I error rate for normally distributed quantitative traits; (iii) provides comparable power for small to medium-sized genes (<100 kb); (iv) provides greater power when the causal variants are uncommon; (v) provides considerably less power for larger genes (≥100 kb) when the causal variants have a broad minor allele frequency (MAF) spectrum. Application to a GWAS of leukocyte levels identified SAFB and a histone gene cluster as novel putative loci harboring multiple independent variants regulating lymphocyte and neutrophil counts.
典型相关分析(CCA)衡量两组多维变量之间的关联。我们推断,CCA 可以提供一种高效且强大的方法,用于进行单变量和多变量基于基因的关联测试,而无需进行置换检验。
与常用的基于置换的方法相比,CCA:(i)更快;(ii)对于正态分布的定量性状具有适当的Ⅰ型错误率;(iii)对于小至中等大小的基因(<100kb),提供可比的功效;(iv)当因果变异罕见时,提供更大的功效;(v)当因果变异具有广泛的次要等位基因频率(MAF)谱时,对于较大的基因(≥100kb),提供的功效要小得多。对白细胞水平的 GWAS 的应用确定了 SAFB 和一个组蛋白基因簇作为新的潜在位点,其中包含多个独立的变异,可调节淋巴细胞和中性粒细胞计数。