Wei Qianran, Chen Lili, Zhou Yajing, Wang Huiyi
Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China.
Genetica. 2023 Apr;151(2):97-104. doi: 10.1007/s10709-023-00179-9. Epub 2023 Jan 19.
Extensive evidence from genome-wide association studies (GWAS) has shown that jointly analyzing multiple phenotypes can improve the power of the association test compared to the traditional single variant versus single trait approach. Here we propose an adaptive test based on principal components (ATPC) that is powerful and efficient for discovering the association between a single variant and multiple traits. Our method only needs GWAS summary statistics that are often available. We first estimate the trait correlation matrix by LD score regression. Then, based on the correlation matrix, we construct a series of test statistics that contain different numbers of principal components. The ultimate test statistic combines the P values of these principal component-based statistics by using the aggregated Cauchy association test. The analytical P-value of the test statistic can be computed quickly without the permutation process, which is the notable feature of our proposed method. The extensive simulation studies demonstrate that ATPC can control the type I error rates and have powerful and robust performance compared to several existing tests in a wide range of simulation settings. The analysis of the lipids GWAS summary data from the Global Lipids Genetics Consortium shows that ATPC identifies 230 new SNPs that are missed by the original single trait association analysis. By searching the GWAS Catalog, some SNPs and mapped genes identified by ATPC are reported to be associated with lipid traits. Through further analysis for GWAS results, we also find some Gene Ontology terms and biological pathways related to lipids.
全基因组关联研究(GWAS)的大量证据表明,与传统的单变体对单性状方法相比,联合分析多个表型可以提高关联检验的效能。在此,我们提出一种基于主成分的自适应检验(ATPC),它在发现单变体与多个性状之间的关联方面既强大又高效。我们的方法只需要通常可得的GWAS汇总统计量。我们首先通过LD得分回归估计性状相关矩阵。然后,基于相关矩阵,我们构建一系列包含不同数量主成分的检验统计量。最终的检验统计量通过使用聚合柯西关联检验来合并这些基于主成分的统计量的P值。检验统计量的分析P值无需置换过程即可快速计算,这是我们所提出方法的显著特征。广泛的模拟研究表明,在广泛的模拟设置中,与几种现有检验相比,ATPC可以控制I型错误率,并且具有强大且稳健的性能。对全球脂质遗传学联盟的脂质GWAS汇总数据的分析表明,ATPC识别出230个新的单核苷酸多态性(SNP)位点,这些位点在最初单性状关联分析中被遗漏。通过搜索GWAS目录,据报道ATPC识别出的一些SNP位点和映射基因与脂质性状相关。通过对GWAS结果的进一步分析,我们还发现了一些与脂质相关基因本体术语和生物途径。