基于可解释特征和潜在子空间学习的多组学聚类进行癌症亚型识别。

Cancer subtype identification by multi-omics clustering based on interpretable feature and latent subspace learning.

机构信息

*Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.

出版信息

Methods. 2024 Nov;231:144-153. doi: 10.1016/j.ymeth.2024.09.014. Epub 2024 Sep 24.

DOI:10.1016/j.ymeth.2024.09.014

Abstract

In recent years, multi-omics clustering has become a powerful tool in cancer research, offering a comprehensive perspective on the diverse molecular characteristics inherent to various cancer subtypes. However, most existing multi-omics clustering methods directly integrate heterogeneous features from different omics, which may struggle to deal with the noise or redundancy of multi-omics data and lead to poor clustering results. Therefore, we propose a novel multi-omics clustering method to extract interpretable and discriminative features from various omics before data integration. The clinical information is used to supervise the process of feature extraction based on SHAP (SHapley Additive exPlanation) values. Singular value decomposition (SVD) is then applied to integrate the extracted features of different omics by constructing a latent subspace. Finally, we utilize shared nearest neighbor-based spectral clustering on the latent representation to obtain the clustering result. The proposed method is evaluated on several cancer datasets across three levels of omics, in comparison to several state-of-the-art multi-omics clustering methods. The comparison results demonstrate the superior performance of the proposed method in multi-omics data analysis for cancer subtyping. Additionally, experiments reveal the efficacy of utilizing clinical information based on SHAP values for feature extraction, enhancing the performance of clustering analyses. Moreover, enrichment analysis of the identified gene signatures in different subtypes is also performed to further demonstrate the effectiveness of the proposed method. Availability: The proposed method can be freely accessible at https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP. Data will be made available on request.

摘要

近年来，多组学聚类已成为癌症研究中的有力工具，为各种癌症亚型固有的不同分子特征提供了全面的视角。然而，大多数现有的多组学聚类方法直接整合来自不同组学的异质特征，这可能难以处理多组学数据的噪声或冗余，并导致聚类结果不佳。因此，我们提出了一种新的多组学聚类方法，在数据整合之前从各种组学中提取可解释和有区别的特征。临床信息用于基于 SHAP（Shapley Additive exPlanation）值监督特征提取过程。然后，通过构建潜在子空间，应用奇异值分解（SVD）来整合不同组学提取的特征。最后，我们利用基于共享最近邻的谱聚类对潜在表示进行聚类。我们在三个层面的多个癌症数据集上评估了所提出的方法，并与几种最先进的多组学聚类方法进行了比较。比较结果表明，所提出的方法在癌症亚型的多组学数据分析中具有优越的性能。此外，实验还表明，基于 SHAP 值利用临床信息进行特征提取可提高聚类分析的性能。此外，还对不同亚型中识别出的基因特征进行了富集分析，进一步证明了所提出方法的有效性。可获取性：所提出的方法可在 https://github.com/Tianyi-Shi-Tsukuba/Multi-omics-clustering-based-on-SHAP 上免费访问。如有请求，数据将提供。