Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, People's Republic of China.
Key Laboratory of Smart Farming for Agricultural Animals, Ministry of Agriculture and Rural Affairs, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, People's Republic of China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae541.
Using multi-omics data for clustering (cancer subtyping) is crucial for precision medicine research. Despite numerous methods having been proposed, current approaches either do not perform satisfactorily or lack biological interpretability, limiting the practical application of these methods. Based on the biological hypothesis that patients with the same subtype may exhibit similar dysregulated pathways, we developed an Iterative Pathway Fusion approach for enhanced Multi-omics Clustering (IPFMC), a novel multi-omics clustering method involving two data fusion stages. In the first stage, omics data are partitioned at each layer using pathway information, with crucial pathways iteratively selected to represent samples. Ultimately, the representation information from multiple pathways is integrated. In the second stage, similarity network fusion was applied to integrate the representation information from multiple omics. Comparative experiments with nine cancer datasets from The Cancer Genome Atlas (TCGA), involving systematic comparisons with 10 representative methods, reveal that IPFMC outperforms these methods. Additionally, the biological pathways and genes identified by our approach hold biological significance, affirming not only its excellent clustering performance but also its biological interpretability.
利用多组学数据进行聚类(癌症亚型分类)对于精准医学研究至关重要。尽管已经提出了许多方法,但目前的方法要么表现不尽如人意,要么缺乏生物学可解释性,限制了这些方法的实际应用。基于患者具有相同亚型可能表现出相似失调途径的生物学假设,我们开发了一种迭代途径融合方法用于增强多组学聚类(IPFMC),这是一种涉及两个数据融合阶段的新型多组学聚类方法。在第一阶段,使用途径信息对组学数据进行分层划分,迭代选择关键途径来表示样本。最终,整合来自多个途径的表示信息。在第二阶段,应用相似网络融合来整合来自多个组学的表示信息。使用来自癌症基因组图谱(TCGA)的九个癌症数据集进行对比实验,并与 10 种代表性方法进行系统比较,结果表明 IPFMC 优于这些方法。此外,我们方法识别出的生物学途径和基因具有生物学意义,不仅证实了其出色的聚类性能,还证实了其生物学可解释性。