Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA.
Bioinformatics. 2019 Apr 15;35(8):1373-1379. doi: 10.1093/bioinformatics/bty810.
There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study.
R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html.
Supplementary data are available at Bioinformatics online.
基于以下原因,人们对基于全基因组关联研究(GWAS)的多种表型进行联合分析越来越感兴趣。首先,队列通常会收集多种表型,而复杂疾病通常通过多种相关的中间表型来测量。其次,联合分析多种表型可以提高检测与复杂疾病相关遗传变异的统计能力。第三,越来越多的证据表明,多效性是复杂疾病中的一种普遍现象。在本文中,我们开发了一种聚类线性组合(CLC)方法,用于 GWAS 中联合分析多种表型。在 CLC 方法中,我们首先将个体统计数据聚类为正相关的聚类,然后在线性组合每个聚类内的个体统计数据,并以二次形式组合聚类间的项。CLC 不仅对个体统计数据均值的正负号具有鲁棒性,而且还降低了检验统计量的自由度。我们还从理论上证明,如果我们可以正确地对个体统计数据进行聚类,CLC 是所有具有特定二次形式的检验中最强大的检验。我们的模拟结果表明,CLC 要么是最强大的检验,要么与我们比较的检验中最强大的检验具有相似的功效,并且当效果大小与推断的聚类一致时,CLC 比其他检验更强大。我们还通过一个真实案例研究来评估 CLC 的性能。
实现我们方法的 R 代码可在 http://www.math.mtu.edu/∼shuzhang/software.html 上获得。
补充数据可在 Bioinformatics 在线获得。