全英生物银行全基因组关联分析增强了对祖先富集效应的发现和解析能力。

Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects.

作者信息

Karczewski Konrad J, Gupta Rahul, Kanai Masahiro, Lu Wenhan, Tsuo Kristin, Wang Ying, Walters Raymond K, Turley Patrick, Callier Shawneequa, Shah Nirav N, Baya Nikolas, Palmer Duncan S, Goldstein Jacqueline I, Sarma Gopal, Solomonson Matthew, Cheng Nathan, Bryant Sam, Churchhouse Claire, Cusick Caroline M, Poterba Timothy, Compitello John, King Daniel, Zhou Wei, Seed Cotton, Finucane Hilary K, Daly Mark J, Neale Benjamin M, Atkinson Elizabeth G, Martin Alicia R

机构信息

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.

出版信息

Nat Genet. 2025 Sep 18. doi: 10.1038/s41588-025-02335-7.

DOI:10.1038/s41588-025-02335-7

PMID:40968291

Abstract

Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, people from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UK Biobank than previous efforts, to produce freely available summary statistics for 7,266 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci (P < 5 × 10) in the meta-analysis that were not found in the EUR genetic ancestry group alone, including new associations, for example between CAMK2D and triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant in G6PD associated with several biomarker traits. We release these results publicly alongside frequently asked questions that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations.

摘要

大型生物样本库，如英国生物样本库（UKB），通过全基因组关联研究实现大规模表型组分析，从而阐明复杂性状的遗传病因。然而，由于担心群体结构会引入假阳性关联，来自不同遗传血统群体的人往往被排除在关联分析之外。在此，我们跨遗传血统群体生成混合模型关联和荟萃分析，纳入了比以往研究更大比例的英国生物样本库数据，以生成7266个性状的免费汇总统计数据。我们构建了一个基于遗传结构的质量控制和分析框架。总体而言，我们在荟萃分析中识别出14,676个显著位点（P < 5 × 10⁻⁸），这些位点在仅欧洲遗传血统群体分析中未被发现，包括新的关联，例如钙/钙调蛋白依赖性蛋白激酶2D（CAMK2D）与甘油三酯之间的关联。我们还强调了来自血统富集变异的关联，包括葡萄糖-6-磷酸脱氢酶（G6PD）中一个已知的与多种生物标志物性状相关的多效性错义变异。我们公开了这些结果以及常见问题解答，这些解答描述了结果解释的注意事项，增强了跨不同人群解释风险变异可用资源。