Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America.
PLoS One. 2022 Apr 28;17(4):e0260911. doi: 10.1371/journal.pone.0260911. eCollection 2022.
There has been an increasing interest in joint analysis of multiple phenotypes in genome-wide association studies (GWAS) because jointly analyzing multiple phenotypes may increase statistical power to detect genetic variants associated with complex diseases or traits. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes in genetic association studies, including the Clustering Linear Combination (CLC) method. The CLC method works particularly well with phenotypes that have natural groupings, but due to the unknown number of clusters for a given data, the final test statistic of CLC method is the minimum p-value among all p-values of the CLC test statistics obtained from each possible number of clusters. Therefore, a simulation procedure needs to be used to evaluate the p-value of the final test statistic. This makes the CLC method computationally demanding. We develop a new method called computationally efficient CLC (ceCLC) to test the association between multiple phenotypes and a genetic variant. Instead of using the minimum p-value as the test statistic in the CLC method, ceCLC uses the Cauchy combination test to combine all p-values of the CLC test statistics obtained from each possible number of clusters. The test statistic of ceCLC approximately follows a standard Cauchy distribution, so the p-value can be obtained from the cumulative density function without the need for the simulation procedure. Through extensive simulation studies and application on the COPDGene data, the results demonstrate that the type I error rates of ceCLC are effectively controlled in different simulation settings and ceCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
人们对全基因组关联研究(GWAS)中多种表型的联合分析越来越感兴趣,因为联合分析多种表型可以提高检测与复杂疾病或特征相关的遗传变异的统计能力。最近,已经开发了许多用于遗传关联研究中多个表型联合分析的统计方法,包括聚类线性组合(CLC)方法。CLC 方法在具有自然分组的表型上效果特别好,但由于给定数据的聚类数量未知,CLC 方法的最终检验统计量是从每个可能的聚类数量获得的 CLC 检验统计量的最小 p 值。因此,需要使用模拟过程来评估最终检验统计量的 p 值。这使得 CLC 方法计算量很大。我们开发了一种称为计算高效 CLC(ceCLC)的新方法,用于检验多个表型与遗传变异之间的关联。ceCLC 不是像 CLC 方法那样使用最小 p 值作为检验统计量,而是使用柯西组合检验来组合从每个可能的聚类数量获得的所有 CLC 检验统计量的 p 值。ceCLC 的检验统计量近似遵循标准的柯西分布,因此无需模拟过程即可从累积密度函数中获得 p 值。通过广泛的模拟研究和 COPDGene 数据的应用,结果表明,ceCLC 在不同的模拟设置下有效地控制了第一类错误率,并且 ceCLC 要么优于所有其他方法,要么与已比较的最强大方法具有非常接近的统计能力。