Quackenbush Adam, Kolluri Jaya, Biju Rohan, Nhong Saron, DeConti Derrick K, Quackenbush John, Saha Enakshi
Boston University Academy, Boston, MA, USA.
University of Chicago, Chicago, IL, USA.
bioRxiv. 2025 Aug 21:2025.08.15.670514. doi: 10.1101/2025.08.15.670514.
Large-scale, open-access data sets such as the Genotype Tissue Expression Project (GTEx) and The Cancer Genome Atlas (TCGA) include multi-omic data on large numbers of samples along with extensive clinical and phenotypic information. These datasets provide a unique opportunity to discover correlations among clinical and genomic data features that can lead to testable hypotheses and new discoveries. SEAHORSE (http://seahorse.networkmedicine.org/) is a web-based database and search tool for exploratory data analysis in which we have pre-computed statistical associations between available data elements. An easy-to-use user interface allows users to explore significant associations using tabulated summary statistics, data visualizations, and functional enrichment analyses (using RNA-seq data) for identified sets of genes. We describe the motivation and construction of SEAHORSE and demonstrate its utility by documenting several surprising association patterns observed across multiple tissues in GTEx and multiple different cancer types in TCGA.
诸如基因型组织表达计划(GTEx)和癌症基因组图谱(TCGA)之类的大规模开放获取数据集包含大量样本的多组学数据以及广泛的临床和表型信息。这些数据集提供了一个独特的机会,来发现临床和基因组数据特征之间的相关性,从而得出可检验的假设和新发现。SEAHORSE(http://seahorse.networkmedicine.org/)是一个基于网络的数据库和搜索工具,用于探索性数据分析,我们已经预先计算了可用数据元素之间的统计关联。一个易于使用的用户界面允许用户使用列表汇总统计、数据可视化以及针对已识别基因集的功能富集分析(使用RNA测序数据)来探索显著关联。我们描述了SEAHORSE的动机和构建,并通过记录在GTEx的多个组织和TCGA的多种不同癌症类型中观察到的几种惊人关联模式来证明其效用。