Tecnologico de Monterrey, Escuela de Medicina, Bioinformática, Monterrey, Nuevo León, México.
Tecnologico de Monterrey, OriGen Project, Monterrey, Nuevo León, México.
PLoS One. 2024 Oct 4;19(10):e0309961. doi: 10.1371/journal.pone.0309961. eCollection 2024.
Coexpression estimations are helpful for analysis of pathways, cofactors, regulators, targets, and human health and disease. Ideally, coexpression estimations should consider as many diverse cell types as possible and consider that available data is not uniform across tissues. Importantly, the coexpression estimations accessible today are performed on a "tissue level", which is based on cell type standardized formulations. Little or no attention is paid to overall gene expression levels. The tissue-level estimation assumes that variance expression levels are more important than mean expression levels. Here, we challenge this assumption by estimating a coexpression calculation at the "system level", which is estimated without standardization by tissue, and show that it provides valuable information. We made available a resource to view, download, and analyze both, tissue- and system-level coexpression estimations from GTEx human data.
GTEx v8 expression data was globally normalized, batch-processed, and filtered. Then, PCA, clustering, and tSNE stringent procedures were applied to generate 42 distinct and curated tissue clusters. Coexpression was estimated from these 42 tissue clusters computing the correlation of 33,445 genes by sampling 70 samples per tissue cluster to avoid tissue overrepresentation. This process was repeated 20 times, extracting the minimum value provided as a robust estimation. Three metrics were calculated (Pearson, Spearman, and G-statistic) in two data processing modes, at the system-level (TPM scale) and tissue levels (z-score scale).
We first validate our tissue-level estimations compared with other databases. Then, by specific analyses in several examples and literature validations of predictions, we show that system-level coexpression estimation differs from tissue-level estimations and that both contain valuable information reflected in biological pathways. We also show that coexpression estimations are associated to transcriptional regulation. Finally, we present CoGTEx, a valuable resource for viewing and analyzing coexpressed genes in human adult tissues from GTEx v8 data. We introduce our web resource to list, view and explore the coexpressed genes from GTEx data.
We conclude that system-level coexpression is a novel and interesting coexpression metric capable of generating plausible predictions and biological hypotheses; and that CoGTEx is a valuable resource to view, compare, and download system- and tissue- level coexpression estimations from GTEx data.
The web resource is available at http://bioinformatics.mx/cogtex.
共表达估计有助于分析途径、协同因子、调节剂、靶标以及人类健康和疾病。理想情况下,共表达估计应尽可能考虑多种不同的细胞类型,并考虑到可用数据在组织之间并不统一。重要的是,目前可获得的共表达估计是在“组织水平”上进行的,这是基于细胞类型标准化公式的。很少或根本没有关注整体基因表达水平。组织水平的估计假设方差表达水平比均值表达水平更重要。在这里,我们通过在“系统水平”上估计共表达计算来挑战这一假设,该计算不按组织标准化,并且表明它提供了有价值的信息。我们提供了一个资源,可以查看、下载和分析来自 GTEx 人类数据的组织和系统水平的共表达估计。
GTEx v8 表达数据进行了全局标准化、批处理和过滤。然后,应用 PCA、聚类和 tSNE 严格程序生成 42 个独特且经过精心处理的组织簇。通过对每个组织簇采样 70 个样本来计算 33445 个基因的相关性,从这些 42 个组织簇中估计共表达,以避免组织过表达。这个过程重复了 20 次,提取作为稳健估计的最小值。在两种数据处理模式下(TPM 尺度和 z 分数尺度),在系统水平(TPM 尺度)和组织水平(z 分数尺度)计算了三个度量值(Pearson、Spearman 和 G-统计量)。
我们首先将我们的组织水平估计与其他数据库进行验证比较。然后,通过在几个示例中的特定分析和对预测的文献验证,我们表明系统水平的共表达估计与组织水平的估计不同,两者都包含反映在生物学途径中的有价值的信息。我们还表明,共表达估计与转录调控有关。最后,我们展示了 CoGTEx,这是一个从 GTEx v8 数据中查看和分析人类成年组织中共表达基因的有价值的资源。我们引入了我们的网络资源,以列出、查看和探索来自 GTEx 数据的共表达基因。
我们得出结论,系统水平的共表达是一种新颖且有趣的共表达度量标准,能够生成合理的预测和生物学假设;并且 CoGTEx 是一个从 GTEx 数据中查看、比较和下载系统和组织水平共表达估计的有价值的资源。
该网络资源可在 http://bioinformatics.mx/cogtex 上获得。