Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solnavägen 9, Solna 17165, Sweden .
Faculty of Applied Medical Sciences, University of Al-Baha, Al-Baha 65528, Saudi Arabia.
Cells. 2022 Dec 19;11(24):4121. doi: 10.3390/cells11244121.
To understand complex diseases, high-throughput data are generated at large and multiple levels. However, extracting meaningful information from large datasets for comprehensive understanding of cell phenotypes and disease pathophysiology remains a major challenge. Despite tremendous advances in understanding molecular mechanisms of cancer and its progression, current knowledge appears discrete and fragmented. In order to render this wealth of data more integrated and thus informative, we have developed a GECIP toolbox to investigate the crosstalk and the responsible genes'/proteins' connectivity of enriched pathways from gene expression data. To implement this toolbox, we used mainly gene expression datasets of prostate cancer, and the three datasets were GSE17951, GSE8218, and GSE1431. The raw samples were processed for normalization, prediction of differentially expressed genes, and the prediction of enriched pathways for the differentially expressed genes. The enriched pathways have been processed for crosstalk degree calculations for which number connections per gene, the frequency of genes in the pathways, sharing frequency, and the connectivity have been used. For network prediction, protein-protein interaction network database FunCoup2.0 was used, and cytoscape software was used for the network visualization. In our results, we found that there were enriched pathways 27, 45, and 22 for GSE17951, GSE8218, and GSE1431, respectively, and 11 pathways in common between all of them. From the crosstalk results, we observe that focal adhesion and PI3K pathways, both experimentally proven central for cellular output upon perturbation of numerous individual/distinct signaling pathways, displayed highest crosstalk degree. Moreover, we also observe that there were more critical pathways which appear to be highly significant, and these pathways are HIF1a, hippo, AMPK, and Ras. In terms of the pathways' components, GSK3B, YWHAE, HIF1A, ATP1A3, and PRKCA are shared between the aforementioned pathways and have higher connectivity with the pathways and the other pathway components. Finally, we conclude that the focal adhesion and PI3K pathways are the most critical pathways, and since for many other pathways, high-rank enrichment did not translate to high crosstalk degree, the global impact of one pathway on others appears distinct from enrichment.
为了理解复杂疾病,在多个层面上产生了高通量数据。然而,从大型数据集提取有意义的信息以全面理解细胞表型和疾病病理生理学仍然是一个主要挑战。尽管在理解癌症及其进展的分子机制方面取得了巨大进展,但目前的知识似乎是离散和碎片化的。为了使这些丰富的数据更具整体性和信息量,我们开发了一个 GECIP 工具箱,用于从基因表达数据中研究富集途径的串扰和负责基因/蛋白质的连通性。为了实现这个工具箱,我们主要使用了前列腺癌的基因表达数据集,这三个数据集是 GSE17951、GSE8218 和 GSE1431。原始样本经过了归一化、差异表达基因预测和差异表达基因富集途径预测的处理。富集途径已经过串扰程度计算处理,其中使用了每个基因的连接数、途径中的基因频率、共享频率和连通性。对于网络预测,使用了蛋白质-蛋白质相互作用网络数据库 FunCoup2.0,并用 cytoscape 软件进行了网络可视化。在我们的结果中,我们发现 GSE17951、GSE8218 和 GSE1431 分别有 27、45 和 22 个富集途径,其中有 11 个途径是它们共有的。从串扰结果来看,我们观察到粘着斑和 PI3K 途径,这两个途径在扰动众多个体/不同信号途径后,对细胞输出具有实验证明的核心作用,显示出最高的串扰程度。此外,我们还观察到,有更多的关键途径似乎具有高度重要性,这些途径是 HIF1a、hippo、AMPK 和 Ras。就途径的组成部分而言,GSK3B、YWHAE、HIF1A、ATP1A3 和 PRKCA 在上述途径之间共享,与途径和其他途径组成部分具有更高的连通性。最后,我们得出结论,粘着斑和 PI3K 途径是最关键的途径,因为对于许多其他途径,高排名的富集并没有转化为高串扰程度,一个途径对其他途径的整体影响与富集不同。