School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
School of Information and Electrical Engineering, Ludong University, Yantai, 264025, China.
BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):334. doi: 10.1186/s12859-021-04220-6.
The identification of cancer types is of great significance for early diagnosis and clinical treatment of cancer. Clustering cancer samples is an important means to identify cancer types, which has been paid much attention in the field of bioinformatics. The purpose of cancer clustering is to find expression patterns of different cancer types, so that the samples with similar expression patterns can be gathered into the same type. In order to improve the accuracy and reliability of cancer clustering, many clustering methods begin to focus on the integration analysis of cancer multi-omics data. Obviously, the methods based on multi-omics data have more advantages than those using single omics data. However, the high heterogeneity and noise of cancer multi-omics data pose a great challenge to the multi-omics analysis method.
In this study, in order to extract more complementary information from cancer multi-omics data for cancer clustering, we propose a low-rank subspace clustering method called multi-view manifold regularized compact low-rank representation (MmCLRR). In MmCLRR, each omics data are regarded as a view, and it learns a consistent subspace representation by imposing a consistence constraint on the low-rank affinity matrix of each view to balance the agreement between different views. Moreover, the manifold regularization and concept factorization are introduced into our method. Relying on the concept factorization, the dictionary can be updated in the learning, which greatly improves the subspace learning ability of low-rank representation. We adopt linearized alternating direction method with adaptive penalty to solve the optimization problem of MmCLRR method.
Finally, we apply MmCLRR into the clustering of cancer samples based on multi-omics data, and the clustering results show that our method outperforms the existing multi-view methods.
癌症类型的鉴定对于癌症的早期诊断和临床治疗具有重要意义。聚类癌症样本是识别癌症类型的重要手段,在生物信息学领域受到了广泛关注。癌症聚类的目的是找到不同癌症类型的表达模式,以便将具有相似表达模式的样本聚集到同一类型中。为了提高癌症聚类的准确性和可靠性,许多聚类方法开始关注癌症多组学数据的整合分析。显然,基于多组学数据的方法比使用单一组学数据的方法具有更多的优势。然而,癌症多组学数据的高度异质性和噪声给多组学分析方法带来了巨大的挑战。
在这项研究中,为了从癌症多组学数据中提取更多用于癌症聚类的互补信息,我们提出了一种称为多视图流形正则紧致低秩表示(MmCLRR)的低秩子空间聚类方法。在 MmCLRR 中,每个组学数据都被视为一个视图,并通过对每个视图的低秩相似性矩阵施加一致性约束来学习一致的子空间表示,以平衡不同视图之间的一致性。此外,我们的方法还引入了流形正则化和概念分解。依赖于概念分解,可以在学习过程中更新字典,从而大大提高了低秩表示的子空间学习能力。我们采用带自适应惩罚的线性交替方向法来求解 MmCLRR 方法的优化问题。
最后,我们将 MmCLRR 应用于基于多组学数据的癌症样本聚类中,聚类结果表明,我们的方法优于现有的多视图方法。