Department of Pathology and Laboratory Medicine, University of California-Los Angeles, Los Angeles, California, United States of America.
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
PLoS Genet. 2021 Mar 8;17(3):e1009398. doi: 10.1371/journal.pgen.1009398. eCollection 2021 Mar.
Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1-2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.
传统的转录组全关联研究(TWAS)预测模型仅考虑到基因内的单核苷酸多态性(SNPs),并通过正则化过程进行参数收缩。这些方法忽略了 SNP 与基因关联的远端 SNPs 或其他分子效应的影响。在这里,我们概述了从种系遗传学进行转录组内插的多组学策略,通过优先考虑与感兴趣基因相关的远端 SNPs,从而更有力地测试基因-性状关联。在一个扩展中,我们确定了与基因表达高度相关的中介生物标志物(CpG 位点、microRNAs 和转录因子),并使用它们的局部 SNPs 对这些中介物进行预测模型训练。然后,将中介物的内插值与局部 SNPs 一起纳入基因表达的最终预测模型中。在第二个扩展中,我们评估了远端 eQTL(与基因不在其附近局部窗口中的基因相关的 SNPs)通过局部到这些远端 eSNP 的中介生物标志物的中介效应。具有较大间接中介效应的远端 eSNP 随后与感兴趣基因周围的局部 SNPs 一起纳入转录组预测模型中。使用模拟和来自 ROS/MAP 脑组织和 TCGA 乳腺癌的真实数据,我们显示了基因表达和 TWAS 检测基因-性状关联的能力有了相当大的提高(解释的百分比方差增加 1-2%)。这种转录组全关联研究的综合方法有助于识别组织内遗传调控的复杂相互作用以及各种性状和疾病的重要风险基因。