IEEE J Biomed Health Inform. 2022 Jul;26(7):3578-3589. doi: 10.1109/JBHI.2022.3151333. Epub 2022 Jul 1.
Cancer genome data generally consists of multiple views from different sources. These views provide different levels of information about gene activity, as well as more comprehensive cancer information. The low-rank representation (LRR) method, as a powerful subspace clustering method, has been extended and applied in cancer data research. Although the multi-view learning methods based on low rank representation have achieved good results in cancer multi-omics analysis because they fully consider the consistency and complementarity between views, these methods have some shortcomings in mining the potential local geometry of data. In view of this, this paper proposes a new method named Multi-view Random-walk Graph regularization Low-Rank Representation (MRGLRR) to comprehensively analyze multi-view genomics data. This method uses multi-view model to find the common centroid of view. By constructing a joint affinity matrix to learn the low-rank subspace representation of multiple sets of data, the hidden information of each view is fully obtained. In addition, this method introduces random walk graph regularization constraint to obtain more accurate similarity between samples. Different from the traditional graph regularization constraint, after constructing the KNN graph, we use the random walk algorithm to obtain the weight matrix. The random walk algorithm can retain more local geometric information and better learn the topological structure of the data. What's more, a feature gene selection strategy suitable for multi-view model is proposed to find more differentially expressed genes with research value. Experimental results show that our method is better than other representative methods in terms of clustering and feature gene selection for cancer multi-omics data.
癌症基因组数据通常由来自不同来源的多个视图组成。这些视图提供了关于基因活性的不同层次的信息,以及更全面的癌症信息。低秩表示(LRR)方法作为一种强大的子空间聚类方法,已经得到了扩展并应用于癌症数据研究中。尽管基于低秩表示的多视图学习方法在癌症多组学分析中取得了很好的效果,因为它们充分考虑了视图之间的一致性和互补性,但这些方法在挖掘数据的潜在局部几何结构方面存在一些缺点。针对这一问题,本文提出了一种名为多视图随机游走图正则化低秩表示(MRGLRR)的新方法,用于全面分析多视图基因组学数据。该方法使用多视图模型找到视图的公共质心。通过构建联合相似性矩阵来学习多组数据的低秩子空间表示,可以充分获取每个视图的隐藏信息。此外,该方法引入了随机游走图正则化约束,以获得样本之间更准确的相似度。与传统的图正则化约束不同,在构建 KNN 图后,我们使用随机游走算法来获得权重矩阵。随机游走算法可以保留更多的局部几何信息,并更好地学习数据的拓扑结构。此外,还提出了一种适用于多视图模型的特征基因选择策略,以找到更具研究价值的差异表达基因。实验结果表明,我们的方法在癌症多组学数据的聚类和特征基因选择方面优于其他代表性方法。