GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France.
Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
Bioinformatics. 2019 Jan 1;35(1):62-68. doi: 10.1093/bioinformatics/bty551.
The Cancer Genome Atlas (TCGA) has greatly advanced cancer research by generating, curating and publicly releasing deeply measured molecular data from thousands of tumor samples. In particular, gene expression measures, both within and across cancer types, have been used to determine the genes and proteins that are active in tumor cells.
To more thoroughly investigate the behavior of gene expression in TCGA tumor samples, we introduce a statistical framework for partitioning the variation in gene expression due to a variety of molecular variables including somatic mutations, transcription factors (TFs), microRNAs, copy number alternations, methylation and germ-line genetic variation. As proof-of-principle, we identify and validate specific TFs that influence the expression of PTPN14 in breast cancer cells.
We provide a freely available, user-friendly, browseable interactive web-based application for exploring the results of our transcriptome-wide analyses across 17 different cancers in TCGA at http://ls-shiny-prod.uwm.edu/edge_in_tcga. All TCGA Open Access tier data are available at the Broad Institute GDAC Firehose and were downloaded using the TCGA2STAT R package. TCGA Controlled Access tier data are available via controlled access through the Genomic Data Commons (GDC). R scripts used to download, format and analyze the data and produce the interactive R/Shiny web app have been made available on GitHub at https://github.com/andreamrau/EDGE-in-TCGA.
癌症基因组图谱(TCGA)通过生成、管理和公开发布来自数千个肿瘤样本的深度测量分子数据,极大地推动了癌症研究。特别是,基因表达测量,无论是在癌症类型内还是跨癌症类型,都被用于确定在肿瘤细胞中活跃的基因和蛋白质。
为了更彻底地研究 TCGA 肿瘤样本中基因表达的行为,我们引入了一个统计框架,用于划分由于各种分子变量(包括体细胞突变、转录因子(TFs)、microRNAs、拷贝数改变、甲基化和种系遗传变异)引起的基因表达变化。作为原理验证,我们确定并验证了特定的 TFs 影响乳腺癌细胞中 PTPN14 的表达。
我们提供了一个免费的、用户友好的、可浏览的交互式基于网络的应用程序,用于在 TCGA 中的 17 种不同癌症中探索我们的全转录组分析结果,网址为 http://ls-shiny-prod.uwm.edu/edge_in_tcga。所有 TCGA 公开访问层数据均可在 Broad Institute GDAC Firehose 获得,并使用 TCGA2STAT R 包下载。TCGA 受控访问层数据可通过受控访问通过基因组数据共享(GDC)获得。用于下载、格式化和分析数据并生成交互式 R/Shiny 网络应用程序的 R 脚本已在 GitHub 上发布,网址为 https://github.com/andreamrau/EDGE-in-TCGA。