Suppr超能文献

对不同人群中基因对基因表达的影响进行的强大图谱分析揭示了新的疾病关键基因。

Powerful mapping of -genetic effects on gene expression across diverse populations reveals novel disease-critical genes.

作者信息

Akamatsu Kai, Golzari Stephen, Amariuta Tiffany

机构信息

School of Biological Sciences, UC San Diego, La Jolla, CA, USA.

Department of Medicine, Division of Biomedical Informatics, UC San Diego, La Jolla, CA, USA.

出版信息

medRxiv. 2024 Sep 26:2024.09.25.24314410. doi: 10.1101/2024.09.25.24314410.

Abstract

While disease-associated variants identified by genome-wide association studies (GWAS) most likely regulate gene expression levels, linking variants to target genes is critical to determining the functional mechanisms of these variants. Genetic effects on gene expression have been extensively characterized by expression quantitative trait loci (eQTL) studies, yet data from non-European populations is limited. This restricts our understanding of disease to genes whose regulatory variants are common in European populations. While previous work has leveraged data from multiple populations to improve GWAS power and polygenic risk score (PRS) accuracy, multi-ancestry data has not yet been used to better estimate -genetic effects on gene expression. Here, we present a new method, Multi-Ancestry Gene Expression Prediction Regularized Optimization (MAGEPRO), which constructs robust genetic models of gene expression in understudied populations or cell types by fitting a regularized linear combination of eQTL summary data across diverse cohorts. In simulations, our tool generates more accurate models of gene expression than widely-used LASSO and the state-of-the-art multi-ancestry PRS method, PRS-CSx, adapted to gene expression prediction. We attribute this improvement to MAGEPRO's ability to more accurately estimate causal eQTL effect sizes ( < 3.98 × 10, two-sided paired t-test). With real data, we applied MAGEPRO to 8 eQTL cohorts representing 3 ancestries (average = 355) and consistently outperformed each of 6 competing methods in gene expression prediction tasks. Integration with GWAS summary statistics across 66 complex traits (representing 22 phenotypes and 3 ancestries) resulted in 2,331 new gene-trait associations, many of which replicate across multiple ancestries, including linked to white blood cell count, a gene which is overexpressed in leukemia patients. MAGEPRO also identified biologically plausible novel findings, such as an essential component of GPI biosynthesis, associated with heart failure, which has been previously evidenced by clinical outcome data. Overall, MAGEPRO is a powerful tool to enhance inference of gene regulatory effects in underpowered datasets and has improved our understanding of population-specific and shared genetic effects on complex traits.

摘要

虽然全基因组关联研究(GWAS)鉴定出的疾病相关变异很可能调控基因表达水平,但将变异与靶基因联系起来对于确定这些变异的功能机制至关重要。基因表达定量性状位点(eQTL)研究已广泛表征了基因表达的遗传效应,但来自非欧洲人群的数据有限。这限制了我们对疾病的理解,仅限于那些调控变异在欧洲人群中常见的基因。虽然之前的工作利用了多个群体的数据来提高GWAS的效能和多基因风险评分(PRS)的准确性,但多祖先数据尚未用于更好地估计基因表达的遗传效应。在此,我们提出了一种新方法,多祖先基因表达预测正则化优化(MAGEPRO),该方法通过拟合不同队列的eQTL汇总数据的正则化线性组合,在研究较少的人群或细胞类型中构建稳健的基因表达遗传模型。在模拟中,我们的工具生成的基因表达模型比广泛使用的LASSO和适用于基因表达预测的最新多祖先PRS方法PRS-CSx更准确。我们将这种改进归因于MAGEPRO更准确估计因果eQTL效应大小的能力(<3.98×10,双侧配对t检验)。在实际数据中,我们将MAGEPRO应用于代表3个祖先的8个eQTL队列(平均=355),并在基因表达预测任务中始终优于6种竞争方法中的每一种。与66种复杂性状(代表22种表型和3个祖先)的GWAS汇总统计数据整合后,产生了2331个新的基因-性状关联,其中许多在多个祖先中重复出现,包括与白细胞计数相关的一个基因,该基因在白血病患者中过度表达。MAGEPRO还发现了生物学上合理的新发现,例如GPI生物合成的一个关键成分与心力衰竭相关,这一点先前已被临床结局数据所证实。总体而言,MAGEPRO是一种强大的工具,可增强对低效能数据集中基因调控效应的推断,并增进了我们对复杂性状的群体特异性和共享遗传效应的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a08d/11469471/6625b10d39cc/nihpp-2024.09.25.24314410v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验