Suppr超能文献

多维剪接数据与 GWAS 汇总统计数据的整合,用于风险基因的发现。

Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery.

机构信息

Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America.

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.

出版信息

PLoS Genet. 2022 Jun 30;18(6):e1009814. doi: 10.1371/journal.pgen.1009814. eCollection 2022 Jun.

Abstract

A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG's applications to Alzheimer's disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.

摘要

一种常用于对全基因组关联研究(GWAS)结果进行功能解释的策略是整合 GWAS 和表达数据进行分析。利用这种策略,许多关联方法(如 PrediXcan 和 FUSION)已经成功地通过对 RNA 表达的中介效应来识别与性状相关的基因。然而,这些方法往往忽略了剪接的影响,剪接对疾病的影响与表达一样大。与表达数据相比,使用剪接数据检测关联的一个挑战是由于基因内多维剪接事件导致的多重检验负担很大。在这里,我们引入了多维剪接基因(MSG)方法,该方法包括两个阶段:1)我们使用稀疏典型相关分析(sCCA)通过识别与彼此最大相关的遗传变异和剪接事件的稀疏线性组合来构建潜在的典型向量(CV);2)我们使用 GWAS 汇总统计数据测试遗传调控的剪接 CV 与感兴趣性状之间的关联。模拟表明,在各种情况下,MSG 具有适当的 I 型错误控制和相对于现有多维表达分析方法(即 S-MultiXcan、UTMOST 和 sCCA+ACAT)的实质性功效增益。当应用于基因型组织表达项目数据和 14 个人类复杂性状的 GWAS 汇总统计数据时,MSG 平均识别出比 sCCA+ACAT、S-MultiXcan 和 UTMOST 分别多 83%、115%和 223%的显著基因。我们突出了 MSG 在阿尔茨海默病、低密度脂蛋白胆固醇和精神分裂症中的应用,并发现基于表达的分析方法可能会错过 MSG 识别的大多数基因。我们的结果表明,通过 MSG 聚合剪接数据可以提高识别基因-性状关联的功效,并有助于更好地理解复杂性状的遗传风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/039f/9278751/3e3e67c859ff/pgen.1009814.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验