通过对遗传、环境和社会文化因素进行适当建模，提高代表性不足群体中的全基因组关联研究（GWAS）性能。

Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors.

作者信息

Cataldo-Ramirez Chelsea C, Lin Meng, Mcmahon Aislinn, Gignoux Christopher R, Weaver Timothy D, Henn Brenna M

机构信息

Department of Anthropology, University of California Davis, Davis, CA, 95616, USA.

Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, CA 91001, USA.

出版信息

bioRxiv. 2024 Oct 29:2024.10.28.620716. doi: 10.1101/2024.10.28.620716.

DOI:10.1101/2024.10.28.620716

PMID:39553939

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11565798/

Abstract

Genome-wide association studies (GWAS) and polygenic score (PGS) development are typically constrained by the data available in biobank repositories in which European cohorts are vastly overrepresented. Here, we increase the utility of non-European participant data within the UK Biobank (UKB) by characterizing the genetic affinities of UKB participants who self-identify as Bangladeshi, Indian, Pakistani, "White and Asian" (WA), and "Any Other Asian" (AOA), towards creating a more robust South Asian sample size for future genetic analyses. We assess the relationships between genetic structure and self-selected ethnic identities resulting in consistent patterns of clustering used to train a support vector machine (SVM). The SVM model was utilized to reassign = 1,853 AOA and WA participants at the subcontinental level, and increase the sample size of the UKB South Asian group by 1,381 additional participants. We then leverage these samples to assess GWAS performance and PGS development. We further include environmental covariates in the height GWAS by implementing a rigorous covariate selection procedure, and compare the outputs of two GWAS models: GWAS and GWAS. We show that PGS performance derived from environmentally adjusted GWAS yields comparable prediction to PGS models developed with an order of magnitude larger training dataset ( =0.021 vs 0.026). Models with 7 - 8 environmental covariates double the variance explained by PGS alone. In summary, we demonstrate how GWAS performance can be improved by leveraging ambiguous ethnicity codes, ancestry matched imputation panels, and including environmental covariates.

摘要

全基因组关联研究（GWAS）和多基因评分（PGS）的发展通常受到生物样本库中可用数据的限制，其中欧洲队列的代表性远远超过其他地区。在这里，我们通过描述英国生物银行（UKB）中自我认定为孟加拉裔、印度裔、巴基斯坦裔、“白人和亚洲人”（WA）以及“其他任何亚洲人”（AOA）的参与者的遗传亲和力，提高了非欧洲参与者数据在UKB中的效用，以便为未来的遗传分析创建一个更强大的南亚样本量。我们评估了遗传结构与自我选择的种族身份之间的关系，从而得出用于训练支持向量机（SVM）的一致聚类模式。利用SVM模型在次大陆层面重新分配了1853名AOA和WA参与者，并使UKB南亚群体的样本量增加了1381名参与者。然后，我们利用这些样本评估GWAS性能和PGS发展情况。我们通过实施严格的协变量选择程序，在身高GWAS中进一步纳入环境协变量，并比较了两个GWAS模型的输出结果：GWAS和GWAS。我们表明，从环境调整后的GWAS得出的PGS性能与使用大一个数量级的训练数据集开发的PGS模型具有可比的预测能力（分别为0.021和0.026）。包含7 - 8个环境协变量的模型使PGS单独解释的方差增加了一倍。总之，我们展示了如何通过利用模糊的种族代码、祖先匹配的归因面板以及纳入环境协变量来提高GWAS性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0970/11565798/bff9bf334bd3/nihpp-2024.10.28.620716v1-f0001.jpg

相似文献

Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors.通过对遗传、环境和社会文化因素进行适当建模，提高代表性不足群体中的全基因组关联研究（GWAS）性能。

bioRxiv. 2024 Oct 29:2024.10.28.620716. doi: 10.1101/2024.10.28.620716.

Analysis of common genetic variation and rare CNVs in the Australian Autism Biobank.澳大利亚自闭症生物样本库中常见遗传变异与罕见 CNVs 的分析。

Mol Autism. 2021 Feb 10;12(1):12. doi: 10.1186/s13229-020-00407-5.

Genetic Association and Transferability for Urinary Albumin-Creatinine Ratio as a Marker of Kidney Disease in four Sub-Saharan African Populations and non-continental Individuals of African Ancestry.在四个撒哈拉以南非洲人群和非洲裔非大陆个体中，尿白蛋白肌酐比值作为肾脏疾病标志物的遗传关联和可转移性。

medRxiv. 2024 Apr 12:2024.01.17.24301398. doi: 10.1101/2024.01.17.24301398.

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population.夏威夷原住民人群中人体测量特征和II型糖尿病多基因评分模型的准确性。

medRxiv. 2023 Dec 28:2023.12.25.23300499. doi: 10.1101/2023.12.25.23300499.

Stability of polygenic scores across discovery genome-wide association studies.全基因组关联研究发现中多基因评分的稳定性。

HGG Adv. 2022 Jan 21;3(2):100091. doi: 10.1016/j.xhgg.2022.100091. eCollection 2022 Apr 14.

The power of TOPMed imputation for the discovery of Latino-enriched rare variants associated with type 2 diabetes.TOPMed 插补在发现与 2 型糖尿病相关的拉丁裔丰富罕见变异中的作用。

Diabetologia. 2023 Jul;66(7):1273-1288. doi: 10.1007/s00125-023-05912-9. Epub 2023 May 6.

Large trans-ethnic meta-analysis identifies AKR1C4 as a novel gene associated with age at menarche.大规模跨种族荟萃分析鉴定 AKR1C4 为与初潮年龄相关的新基因。

Hum Reprod. 2021 Jun 18;36(7):1999-2010. doi: 10.1093/humrep/deab086.

Association of Novel Loci With Keratoconus Susceptibility in a Multitrait Genome-Wide Association Study of the UK Biobank Database and Canadian Longitudinal Study on Aging.在 UK Biobank 数据库和加拿大老龄化纵向研究的多性状全基因组关联研究中，与圆锥角膜易感性相关的新基因座的关联。

JAMA Ophthalmol. 2022 Jun 1;140(6):568-576. doi: 10.1001/jamaophthalmol.2022.0891.

Trans-ancestry polygenic models for the prediction of LDL blood levels: an analysis of the United Kingdom Biobank and Taiwan Biobank.用于预测低密度脂蛋白血液水平的跨血统多基因模型：英国生物银行和台湾生物银行的分析

Front Genet. 2023 Nov 23;14:1286561. doi: 10.3389/fgene.2023.1286561. eCollection 2023.

Ethnic disparities in fracture risk assessment using polygenic scores.基于多基因评分的骨折风险评估中的种族差异。

Osteoporos Int. 2023 May;34(5):943-953. doi: 10.1007/s00198-023-06712-y. Epub 2023 Feb 25.

本文引用的文献

Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization.利用分数计算和血统归一化工具增强多基因分数目录。

Nat Genet. 2024 Oct;56(10):1989-1994. doi: 10.1038/s41588-024-01937-x.

Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups.韩国 4K 计划：107 种表型的 4157 名韩国人全基因组序列源于广泛的健康检查。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae014.

A benchmark study on current GWAS models in admixed populations.混合人群中当前 GWAS 模型的基准研究。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad437.

Inferring disease architecture and predictive ability with LDpred2-auto.利用 LDpred2-auto 推断疾病结构和预测能力。

Am J Hum Genet. 2023 Dec 7;110(12):2042-2055. doi: 10.1016/j.ajhg.2023.10.010. Epub 2023 Nov 8.

Mexican Biobank advances population and medical genomics of diverse ancestries.墨西哥生物银行推进了具有不同祖先的人群和医学基因组学研究。

Nature. 2023 Oct;622(7984):775-783. doi: 10.1038/s41586-023-06560-0. Epub 2023 Oct 11.

Genetic distance informs polygenic score predictive accuracy.遗传距离可反映多基因评分的预测准确性。

Trends Genet. 2023 Nov;39(11):813-815. doi: 10.1016/j.tig.2023.07.002. Epub 2023 Jul 29.

Polygenic scoring accuracy varies across the genetic ancestry continuum.多基因评分准确性在遗传祖先连续体上有所差异。

Nature. 2023 Jun;618(7966):774-781. doi: 10.1038/s41586-023-06079-4. Epub 2023 May 17.

Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals.在混合个体中，不同大陆血统片段上的常见变异对复杂性状的因果效应相似。

Nat Genet. 2023 Apr;55(4):549-558. doi: 10.1038/s41588-023-01338-6. Epub 2023 Mar 20.

Addressing the challenges of polygenic scores in human genetic research.解决人类遗传研究中多基因评分面临的挑战。

Am J Hum Genet. 2022 Dec 1;109(12):2095-2100. doi: 10.1016/j.ajhg.2022.10.012.

The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource.NHGRI-EBI GWAS 目录：知识库和存储资源。

Nucleic Acids Res. 2023 Jan 6;51(D1):D977-D985. doi: 10.1093/nar/gkac1010.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过对遗传、环境和社会文化因素进行适当建模，提高代表性不足群体中的全基因组关联研究（GWAS）性能。

Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献