Suppr超能文献

基于 GWAS 汇总统计数据的多表型关联研究的聚类线性组合方法。

A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics.

机构信息

Mathematical Sciences, Michigan Technological University, Houghton, MI, USA.

出版信息

Sci Rep. 2023 Feb 28;13(1):3389. doi: 10.1038/s41598-023-30415-3.

Abstract

There is strong evidence showing that joint analysis of multiple phenotypes in genome-wide association studies (GWAS) can increase statistical power when detecting the association between genetic variants and human complex diseases. We previously developed the Clustering Linear Combination (CLC) method and a computationally efficient CLC (ceCLC) method to test the association between multiple phenotypes and a genetic variant, which perform very well. However, both of these methods require individual-level genotypes and phenotypes that are often not easily accessible. In this research, we develop a novel method called sCLC for association studies of multiple phenotypes and a genetic variant based on GWAS summary statistics. We use the LD score regression to estimate the correlation matrix among phenotypes. The test statistic of sCLC is constructed by GWAS summary statistics and has an approximate Cauchy distribution. We perform a variety of simulation studies and compare sCLC with other commonly used methods for multiple phenotype association studies using GWAS summary statistics. Simulation results show that sCLC can control Type I error rates well and has the highest power in most scenarios. Moreover, we apply the newly developed method to the UK Biobank GWAS summary statistics from the XIII category with 70 related musculoskeletal system and connective tissue phenotypes. The results demonstrate that sCLC detects the most number of significant SNPs, and most of these identified SNPs can be matched to genes that have been reported in the GWAS catalog to be associated with those phenotypes. Furthermore, sCLC also identifies some novel signals that were missed by standard GWAS, which provide new insight into the potential genetic factors of the musculoskeletal system and connective tissue phenotypes.

摘要

有强有力的证据表明,在全基因组关联研究(GWAS)中联合分析多个表型可以提高检测遗传变异与人类复杂疾病之间关联的统计能力。我们之前开发了聚类线性组合(CLC)方法和一种计算效率高的 CLC(ceCLC)方法来检验多个表型与遗传变异之间的关联,这两种方法表现都非常出色。然而,这两种方法都需要个体水平的基因型和表型,而这些通常不容易获得。在这项研究中,我们开发了一种新的方法,称为 sCLC,用于基于 GWAS 汇总统计数据的多个表型和遗传变异的关联研究。我们使用 LD 得分回归来估计表型之间的相关矩阵。sCLC 的检验统计量由 GWAS 汇总统计数据构建,具有近似的 Cauchy 分布。我们进行了各种模拟研究,并使用 GWAS 汇总统计数据比较了 sCLC 与其他常用的多表型关联研究方法。模拟结果表明,sCLC 可以很好地控制第一类错误率,并且在大多数情况下具有最高的功效。此外,我们将新开发的方法应用于 UK Biobank GWAS 汇总统计数据的第十三类,其中有 70 个相关的肌肉骨骼系统和结缔组织表型。结果表明,sCLC 检测到了最多数量的显著 SNPs,并且这些鉴定出的 SNPs 中的大多数可以与 GWAS 目录中报告的与这些表型相关的基因相匹配。此外,sCLC 还鉴定出了一些标准 GWAS 错过的新信号,这为肌肉骨骼系统和结缔组织表型的潜在遗传因素提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9466/9975197/236e17f69190/41598_2023_30415_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验