Suppr超能文献

基于 GWAS 汇总统计数据的遗传关联研究中的群体分层控制。

Control for population stratification in genetic association studies based on GWAS summary statistics.

机构信息

Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA.

出版信息

Genet Epidemiol. 2022 Dec;46(8):604-614. doi: 10.1002/gepi.22493. Epub 2022 Jun 29.

Abstract

Over the past years, genome-wide association studies (GWAS) have generated a wealth of new information. Summary data from many GWAS are now publicly available, promoting the development of many statistical methods for association studies based on GWAS summary statistics, which avoids the increasing challenges associated with individual-level genotype and phenotype data sharing. However, for population-based association studies such as GWAS, it has been long recognized that population stratification can seriously confound association results. For large GWAS, it is very likely that there exist population stratification and cryptic relatedness, which will result in inflated Type I error in association testing. Although many methods have been developed to control for population stratification, only two of these approaches can be used to control population stratification without individual-level data: one is based on genomic control (GC) and the other one is based on linkage disequilibrium score regression (LDSC). However, the performance of these two approaches is currently unknown. In this study, we use extensive simulation studies including populations with subpopulations, spatially structured populations, and populations with cryptic relatedness to compare the performance of these two approaches to control for population stratification using only GWAS summary statistics without individual-level data. Data sets from the genetic analysis workshop 19 and UK Biobank are also used to evaluate these two approaches. We demonstrate that the intercept of LDSC can be used as a more accurate correction factor than GC. The results from this study will provide very useful information for researchers using GWAS summary statistics while trying to control for population stratification.

摘要

在过去的几年中,全基因组关联研究(GWAS)已经产生了大量的新信息。现在,许多 GWAS 的汇总数据都是公开的,这促进了许多基于 GWAS 汇总统计数据的关联研究统计方法的发展,这些方法避免了与个体水平基因型和表型数据共享相关的日益增加的挑战。然而,对于基于人群的关联研究(如 GWAS),人们早就认识到群体分层会严重混淆关联结果。对于大型 GWAS,很可能存在群体分层和隐蔽的亲缘关系,这将导致关联测试中的Ⅰ型错误膨胀。尽管已经开发了许多方法来控制群体分层,但只有两种方法可以在没有个体水平数据的情况下用于控制群体分层:一种方法基于基因组控制(GC),另一种方法基于连锁不平衡得分回归(LDSC)。然而,目前还不知道这两种方法的性能如何。在这项研究中,我们使用了广泛的模拟研究,包括具有亚群的人群、空间结构的人群和具有隐蔽亲缘关系的人群,来比较这两种方法在使用仅基于 GWAS 汇总统计数据而不使用个体水平数据的情况下控制群体分层的性能。遗传分析工作坊 19 和英国生物银行的数据也被用于评估这两种方法。我们证明 LDSC 的截距可以用作比 GC 更准确的校正因子。这项研究的结果将为使用 GWAS 汇总统计数据并试图控制群体分层的研究人员提供非常有用的信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验