Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
PLoS Genet. 2021 Apr 8;17(4):e1008973. doi: 10.1371/journal.pgen.1008973. eCollection 2021 Apr.
Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.
转录组关联研究 (TWAS) 检验性状与遗传预测的基因表达水平之间的关联。TWAS 的功效部分取决于基因表达预测因子与因果相关基因表达值之间相关性的强度。因此,当用于训练遗传预测因子的表达数量性状基因座 (eQTL) 数据样本量较小时,或者当没有因果相关组织的数据时,TWAS 的功效可能会较低。在这里,我们通过使用稀疏典型相关分析 (sCCA) 在 TWAS 中整合多个组织来解决这些问题。我们表明,sCCA-TWAS 与使用聚合 Cauchy 关联检验 (ACAT) 的单组织 TWAS 相结合,优于传统的单组织 TWAS。在基于经验的模拟中,即使在没有基因表达与表型之间关联的情况下,sCCA+ACAT 方法也能检测到与表型相关的基因,从而获得最高的功效,同时控制了没有关联时的Ⅰ型错误。例如,当基因表达解释了结果的 2%的可变性,并且 GWAS 样本量为 20,000 时,sCCA 特征与单组织的 ACAT 联合检验,与单组织与广义 Berk-Jones (GBJ) 方法相结合、单组织与 S-MultiXcan、UTMOST 相结合、或使用主成分分析 (PCA) 方法汇总跨组织表达模式相比,平均功效差异分别为 5%、8%、5%和 38%。功效的提高可能是由于 sCCA 跨组织特征更有可能可检测到遗传性。当应用于来自 10 种复杂性状的公开可用汇总统计数据时,sCCA+ACAT 检验能够增加可检验基因的数量,并平均额外识别出 400 个单性状 TWAS 错过的基因-性状关联。我们的结果表明,使用 sCCA 跨多个组织汇总 eQTL 数据可以提高 TWAS 的灵敏度,同时控制假阳性率。