Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland.
PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.
Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results. COPS features advanced methods, including similarity networks, kernel-based approaches, dimensionality reduction, and pathway knowledge integration. Some of these methods are not accessible through R, and some correspond to new approaches proposed with COPS. Our framework was rigorously applied to multi-omics data across seven cancer types, including breast, prostate, and lung, utilizing mRNA, CNV, miRNA, and DNA methylation data. Unlike previous studies, our approach contrasts data- and knowledge-driven multi-view clustering methods and incorporates cross-fold validation for robustness. Clustering outcomes were assessed using the ARI score, survival analysis via Cox regression models including relevant covariates, and the stability of the results. While survival analysis and gold-standard agreement are standard metrics, they vary considerably across methods and datasets. Therefore, it is essential to assess multi-view clustering methods using multiple criteria, from cluster stability to prognostic relevance, and to provide ways of comparing these metrics simultaneously to select the optimal approach for disease subtype discovery in novel datasets. Emphasizing multi-objective evaluation, we applied the Pareto efficiency concept to gauge the equilibrium of evaluation metrics in each cancer case-study. Affinity Network Fusion, Integrative Non-negative Matrix Factorization, and Multiple Kernel K-Means with linear or Pathway Induced Kernels were the most stable and effective in discerning groups with significantly different survival outcomes in several case studies.
最近关于复杂疾病亚型的多视图聚类算法的研究往往忽略了聚类稳定性和预后相关性的关键评估等方面。此外,当前的框架不允许对数据驱动和通路驱动的聚类进行比较,这突显了方法学中的一个重大差距。我们提出了 COPS R 包,专门用于稳健评估单和多组学聚类结果。COPS 具有先进的方法,包括相似性网络、基于核的方法、降维和途径知识集成。其中一些方法无法通过 R 获得,而有些则对应于 COPS 提出的新方法。我们的框架严格应用于七种癌症类型的多组学数据,包括乳腺癌、前列腺癌和肺癌,利用 mRNA、CNV、miRNA 和 DNA 甲基化数据。与以前的研究不同,我们的方法对比了数据和知识驱动的多视图聚类方法,并纳入了交叉验证以提高稳健性。使用 ARI 得分评估聚类结果,通过包括相关协变量的 Cox 回归模型进行生存分析,并评估结果的稳定性。虽然生存分析和金标准一致性是标准指标,但它们在方法和数据集之间差异很大。因此,使用多个标准评估多视图聚类方法至关重要,从聚类稳定性到预后相关性,并提供同时比较这些指标的方法,以选择在新数据集发现疾病亚型的最佳方法。强调多目标评估,我们应用 Pareto 效率概念来衡量每个癌症案例研究中评估指标的平衡。在几个案例研究中,亲和网络融合、综合非负矩阵分解和带有线性或途径诱导核的多核 K-Means 是最稳定和有效的,可以区分具有显著不同生存结果的组。