Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
Nat Commun. 2022 Sep 28;13(1):5704. doi: 10.1038/s41467-022-33212-0.
A majority of the variants identified in genome-wide association studies fall in non-coding regions of the genome, indicating their mechanism of impact is mediated via gene expression. Leveraging this hypothesis, transcriptome-wide association studies (TWAS) have assisted in both the interpretation and discovery of additional genes associated with complex traits. However, existing methods for conducting TWAS do not take full advantage of the intra-individual correlation inherently present in multi-context expression studies and do not properly adjust for multiple testing across contexts. We introduce CONTENT-a computationally efficient method with proper cross-context false discovery correction that leverages correlation structure across contexts to improve power and generate context-specific and context-shared components of expression. We apply CONTENT to bulk multi-tissue and single-cell RNA-seq data sets and show that CONTENT leads to a 42% (bulk) and 110% (single cell) increase in the number of genetically predicted genes relative to previous approaches. We find the context-specific component of expression comprises 30% of heritability in tissue-level bulk data and 75% in single-cell data, consistent with cell-type heterogeneity in bulk tissue. In the context of TWAS, CONTENT increases the number of locus-phenotype associations discovered by over 51% relative to previous methods across 22 complex traits.
大多数在全基因组关联研究中发现的变体都位于基因组的非编码区域,这表明它们的影响机制是通过基因表达来介导的。利用这一假设,转录组全基因组关联研究(TWAS)有助于解释和发现与复杂性状相关的其他基因。然而,现有的 TWAS 方法并没有充分利用多背景表达研究中固有的个体内相关性,也没有对跨背景的多重检验进行适当调整。我们引入了 CONTENT,这是一种计算效率高的方法,具有适当的跨背景错误发现校正功能,可以利用跨背景的相关性结构来提高功效,并生成表达的特定于上下文和共享于上下文的成分。我们将 CONTENT 应用于批量多组织和单细胞 RNA-seq 数据集,并表明与先前的方法相比,CONTENT 使与遗传相关的基因数量分别增加了 42%(批量)和 110%(单细胞)。我们发现,在组织水平的批量数据中,表达的特定于上下文的成分占遗传率的 30%,在单细胞数据中占 75%,这与批量组织中的细胞类型异质性一致。在 TWAS 的背景下,CONTENT 使 22 种复杂性状的关联发现数量相对于先前的方法增加了 51%以上。