Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China.
Shandong Provincial Key Laboratory of Oral Tissue Regeneration, School of Stomatology, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China.
Interdiscip Sci. 2022 Jun;14(2):520-531. doi: 10.1007/s12539-022-00506-2. Epub 2022 Feb 23.
Detecting significant signaling pathways in disease progression highlights the dysfunctions and pathogenic mechanisms of complex disease development. Since tensor decomposition has been proven effective for multi-dimensional data representation and reconstruction, differences between original and tensor-processed data are expected to extract crucial information and differential indication. This paper provides a tensor-based gene set enrichment analysis, called tensorGSEA, based on a data reconstruction method to identify relevant significant pathways during disease development. As a proof-of-concept study, we identify the differential pathways of diabetes in rats. Specifically, we first arrange gene expression profiles of each documented pathway as tensors with three dimensions: genes, samples, and periods. Then we compress tensors into core tensors with lower ranks. The pathways with lower reconstruction rates are obtained after reconstructing gene expression profiles in another state via these cores. Thus, differences underlying pathways are extracted by cross-state data reconstruction between controls and diseases. The experiments reveal several critical pathways with diabetes-specific functions which otherwise cannot be identified by alternative methods. Our proposed tensorGSEA is efficient in evaluating pathways by achieving their empirical statistical significance, respectively. The classification experiments demonstrate that the selected pathways can be implemented as biomarkers to identify the diabetic state. The code of tensorGSEA is available at https://github.com/zhxr37/tensorGSEA .
在疾病进展中检测显著的信号通路突出了复杂疾病发展的功能障碍和发病机制。由于张量分解已被证明对多维数据表示和重建有效,因此原始数据和张量处理后数据之间的差异有望提取关键信息和差异指示。本文提供了一种基于张量的基因集富集分析,称为 tensorGSEA,它基于一种数据重建方法,用于识别疾病发展过程中相关的显著途径。作为概念验证研究,我们确定了大鼠糖尿病的差异途径。具体来说,我们首先将每个有文献记载的途径的基因表达谱排列成具有三个维度的张量:基因、样本和时期。然后,我们将张量压缩成低阶的核心张量。通过这些核心,在另一种状态下重建基因表达谱后,就可以得到具有较低重建率的途径。因此,通过控制状态和疾病状态之间的交叉数据重建来提取途径下的差异。实验揭示了几种具有糖尿病特定功能的关键途径,而这些途径无法通过其他方法识别。我们提出的 tensorGSEA 通过实现其经验统计意义,分别有效地评估途径。分类实验表明,所选途径可以用作识别糖尿病状态的生物标志物。tensorGSEA 的代码可在 https://github.com/zhxr37/tensorGSEA 上获得。