Yang James J, Li Jia, Williams L Keoki, Buu Anne
School of Nursing, University of Michigan, Ann Arbor, Michigan, USA.
Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan, USA.
BMC Bioinformatics. 2016 Jan 5;17:19. doi: 10.1186/s12859-015-0868-6.
In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data.
Our simulation study shows that the proposed method has higher power than existing methods while controlling for the type I error rate. The GEE and the classical Fisher combination test, on the other hand, do not control the type I error rate and thus are not recommended. In general, the power of the competing methods decreases as the correlation between phenotypes increases. All the methods tend to have lower power when the multivariate phenotypes come from long tailed distributions. The real data analysis also demonstrates that the proposed method allows us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct.
The proposed method outperforms existing methods in most settings and also has great applications in GWAS on complex diseases with multiple phenotypes such as the substance abuse disorders.
在复杂疾病的全基因组关联研究(GWAS)中,单核苷酸多态性(SNP)与每种表型之间的关联通常较弱。结合多个相关的表型特征可以提高基因搜索的效能,因此是一个需要方法学研究的实际重要领域。本研究全面综述了对具有多个表型的复杂疾病进行GWAS的现有方法,包括多变量方差分析(MANOVA)、主成分分析(PCA)、广义估计方程(GEE)、涉及扩展西姆斯程序的基于性状的关联检验(TATES)以及经典的费舍尔组合检验。我们提出了一种新方法,该方法放宽了经典费舍尔组合检验不切实际的独立性假设,并且计算效率高。为了证明所提出方法的应用,我们还展示了对成瘾:遗传学与环境研究(SAGE)数据的统计分析结果。
我们的模拟研究表明,所提出的方法在控制I型错误率的同时比现有方法具有更高的效能。另一方面,GEE和经典的费舍尔组合检验无法控制I型错误率,因此不建议使用。一般来说,随着表型之间相关性的增加,竞争方法的效能会降低。当多变量表型来自长尾分布时,所有方法的效能往往较低。实际数据分析也表明,所提出的方法使我们能够比较边际结果与多变量结果,并确定哪些SNP对特定表型具有特异性或对共同结构有贡献。
所提出的方法在大多数情况下优于现有方法,并且在对具有多个表型的复杂疾病(如药物滥用障碍)的GWAS中也有很大的应用价值。