IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2375-2383. doi: 10.1109/TCBB.2020.2975173. Epub 2021 Dec 8.
Non-negative matrix factorization (NMF) is a dimensionality reduction technique based on high-dimensional mapping. It can learn part-based representations effectively. In this paper, we propose a method called Dual Hyper-graph Regularized Supervised Non-negative Matrix Factorization (HSNMF). To encode the geometric information of the data, the hyper-graph is introduced into the model as a regularization term. The advantage of hyper-graph learning is to find higher order data relationship to enhance data relevance. This method constructs the data hyper-graph and the feature hyper-graph to find the data manifold and the feature manifold simultaneously. The application of hyper-graph theory in cancer datasets can effectively find pathogenic genes. The discrimination information is further introduced into the objective function to obtain more information about the data. Supervised learning with label information greatly improves the classification effect. Furthermore, the real datasets of cancer usually contain sparse noise, so the L-norm is applied to enhance the robustness of HSNMF algorithm. Experiments under The Cancer Genome Atlas (TCGA) datasets verify the feasibility of the HSNMF method.
非负矩阵分解(NMF)是一种基于高维映射的降维技术。它可以有效地学习基于部分的表示。在本文中,我们提出了一种称为双超图正则化监督非负矩阵分解(HSNMF)的方法。为了对数据的几何信息进行编码,超图被引入模型作为正则项。超图学习的优点是找到更高阶的数据关系,以增强数据相关性。该方法构建数据超图和特征超图,同时找到数据流形和特征流形。超图理论在癌症数据集的应用可以有效地找到致病基因。判别信息进一步被引入目标函数,以获得更多关于数据的信息。带有标签信息的监督学习极大地提高了分类效果。此外,癌症的真实数据集通常包含稀疏噪声,因此应用 L 范数增强 HSNMF 算法的鲁棒性。在癌症基因组图谱(TCGA)数据集下的实验验证了 HSNMF 方法的可行性。