College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China.
Pazhou Lab, Guangzhou, 510335, China.
Interdiscip Sci. 2024 Sep;16(3):554-567. doi: 10.1007/s12539-024-00606-1. Epub 2024 Mar 1.
Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.
肉瘤是源自间充质组织的恶性肿瘤,其特点是复杂性和多样性。由于其高复发率,了解其复发背后的机制并开发个性化的治疗方法和药物非常重要。然而,之前关于肉瘤复发的多模态数据关联模式的研究忽略了一个事实,即基因不是独立作用的,而是在信号通路中发挥作用。因此,本研究从 UCSC 和 TCGA 收集了 290 张全固体图像、260 多个肉瘤样本的 869 个基因和 1387 个通路数据,以确定与肉瘤复发相关的基因-通路-细胞的关联模式。同时,考虑到大多数基于联合非负矩阵分解 (NMF) 模型的多模态数据融合方法由于因子分解参数的随机初始化而导致实验可重复性差,本研究提出了奇异值分解 (SVD)-驱动的联合 NMF 模型,通过应用 SVD 方法计算初始化权重和系数矩阵来实现结果的可重复性。实验比较的结果表明,SVD 算法增强了联合 NMF 算法的性能。此外,代表性模块表明,通路中的基因与图像特征之间存在显著关系。多层次分析为生物过程、细胞特征与肉瘤复发之间的联系提供了有价值的见解。此外,从成像遗传学的角度揭示了潜在的生物标志物,并确定了各种肉瘤复发的机制。总体而言,SVD-NMF 模型为结合多组学数据探索与肉瘤复发相关的关联提供了新的视角。