Sokolowski Dustin J, Mai Mingjie, Verma Arnav, Morgenshtern Gabriela, Subasri Vallijah, Naveed Hareem, Yampolsky Maria, Wilson Michael D, Goldenberg Anna, Erdman Lauren
Department of Molecular Genetics, University of Toronto, ON M5S 3K3, Canada.
Department of Computer Science, University of Toronto, ON M5S 2E4, Canada.
NAR Genom Bioinform. 2025 Mar 4;7(1):lqaf011. doi: 10.1093/nargab/lqaf011. eCollection 2025 Mar.
Many regulatory factors impact the expression of individual genes including, but not limited, to microRNA, long non-coding RNA (lncRNA), transcription factors (TFs), methylation, copy number variation (CNV), and single-nucleotide polymorphisms (SNPs). While each mechanism can influence gene expression substantially, the relative importance of each mechanism at the level of individual genes and tissues is poorly understood. Here, we present the integrative Models of Estimated gene expression (iModEst), which details the relative contribution of different regulators to the gene expression of 16,000 genes and 21 tissues within The Cancer Genome Atlas (TCGA). Specifically, we derive predictive models of gene expression using tumour data and test their predictive accuracy in cancerous and tumour-adjacent tissues. Our models can explain up to 70% of the variance in gene expression across 43% of the genes within both tumour and tumour-adjacent tissues. We confirm that TF expression best predicts gene expression in both tumour and tumour-adjacent tissue whereas methylation predictive models in tumour tissues does not transfer well to tumour adjacent tissues. We find new patterns and recapitulate previously reported relationships between regulator and gene-expression, such as CNV-predicted expression and SNP-predicted expression. Together, iModEst offers an interactive, comprehensive atlas of individual regulator-gene-tissue expression relationships as well as relationships between regulators.
许多调控因子会影响单个基因的表达,包括但不限于微小RNA、长链非编码RNA(lncRNA)、转录因子(TFs)、甲基化、拷贝数变异(CNV)和单核苷酸多态性(SNP)。虽然每种机制都能对基因表达产生重大影响,但对于每种机制在单个基因和组织水平上的相对重要性,我们了解得还很少。在此,我们展示了估计基因表达的整合模型(iModEst),它详细阐述了不同调控因子对癌症基因组图谱(TCGA)中16000个基因和21种组织的基因表达的相对贡献。具体而言,我们利用肿瘤数据推导基因表达的预测模型,并在癌组织和癌旁组织中测试其预测准确性。我们的模型能够解释肿瘤组织和癌旁组织中43%的基因的基因表达差异的70%。我们证实,转录因子表达能最好地预测肿瘤组织和癌旁组织中的基因表达,而肿瘤组织中的甲基化预测模型在癌旁组织中的转移性不佳。我们发现了新的模式,并重现了先前报道的调控因子与基因表达之间的关系,如拷贝数变异预测的表达和单核苷酸多态性预测的表达。总之,iModEst提供了一个交互式的、全面的图谱,展示了单个调控因子-基因-组织表达关系以及调控因子之间的关系。