IEEE J Biomed Health Inform. 2020 Oct;24(10):3002-3011. doi: 10.1109/JBHI.2020.2975199. Epub 2020 Feb 20.
Non-negative Matrix Factorization (NMF) is a dimensionality reduction approach for learning a parts-based and linear representation of non-negative data. It has attracted more attention because of that. In practice, NMF not only neglects the manifold structure of data samples, but also overlooks the priori label information of different classes. In this paper, a novel matrix decomposition method called Hyper-graph regularized Constrained Non-negative Matrix Factorization (HCNMF) is proposed for selecting differentially expressed genes and tumor sample classification. The advantage of hyper-graph learning is to capture local spatial information in high dimensional data. This method incorporates a hyper-graph regularization constraint to consider the higher order data sample relationships. The application of hyper-graph theory can effectively find pathogenic genes in cancer datasets. Besides, the label information is further incorporated in the objective function to improve the discriminative ability of the decomposition matrix. Supervised learning with label information greatly improves the classification effect. We also provide the iterative update rules and convergence proofs for the optimization problems of HCNMF. Experiments under The Cancer Genome Atlas (TCGA) datasets confirm the superiority of HCNMF algorithm compared with other representative algorithms through a set of evaluations.
非负矩阵分解 (NMF) 是一种降维方法,用于学习非负数据的基于部分和线性的表示。正因为如此,它引起了更多的关注。在实践中,NMF 不仅忽略了数据样本的流形结构,而且忽略了不同类别的先验标签信息。在本文中,提出了一种新的矩阵分解方法,称为超图正则化约束非负矩阵分解 (HCNMF),用于选择差异表达基因和肿瘤样本分类。超图学习的优势在于捕获高维数据中的局部空间信息。该方法结合了超图正则化约束来考虑更高阶的数据样本关系。超图理论的应用可以有效地在癌症数据集发现致病基因。此外,标签信息进一步被纳入目标函数以提高分解矩阵的判别能力。带有标签信息的监督学习极大地提高了分类效果。我们还提供了 HCNMF 优化问题的迭代更新规则和收敛证明。通过一组评估,基于 The Cancer Genome Atlas (TCGA) 数据集的实验证实了 HCNMF 算法与其他代表性算法相比的优越性。