基于子空间共识核分类的癌症分子模式发现

Cancer molecular pattern discovery by subspace consensus kernel classification.

作者信息

Han Xiaoxu

机构信息

Department of Mathematics and Bioinformatics Program, Eastern Michigan University, Ypsilanti, MI 48197, USA.

出版信息

Comput Syst Bioinformatics Conf. 2007;6:55-65.

PMID:17951812

Abstract

Cancer molecular pattern efficient discovery is essential in the molecular diagnostics. The characteristics of the gene/protein expression data are challenging traditional unsupervised classification algorithms. In this work, we describe a subspace consensus kernel clustering algorithm based on the projected gradient nonnegative matrix factorization (PG-NMF). The algorithm is a consensus kernel hierarchical clustering (CKHC) method in the subspace generated by the PG-NMF. It integrates convergence-soundness parts-based learning, subspace and kernel space clustering in the microarray and proteomics data classification. We first integrated subspace methods and kernel methods by following our framework of the input space, subspace and kernel space clustering. We demonstrate more effective classification results from our algorithm by comparison with those of the classic NMF, sparse-NMF classifications and supervised classifications (KNN and SVM) for the four benchmark cancer datasets. Our algorithm can generate a family of classification algorithms in machine learning by selecting different transforms to generate subspaces and different kernel clustering algorithms to cluster data.

摘要

癌症分子模式的有效发现对于分子诊断至关重要。基因/蛋白质表达数据的特征对传统的无监督分类算法提出了挑战。在这项工作中，我们描述了一种基于投影梯度非负矩阵分解（PG-NMF）的子空间共识核聚类算法。该算法是PG-NMF生成的子空间中的一种共识核层次聚类（CKHC）方法。它在微阵列和蛋白质组学数据分类中集成了基于收敛稳健部分的学习、子空间和核空间聚类。我们首先按照输入空间、子空间和核空间聚类的框架集成了子空间方法和核方法。通过与四个基准癌症数据集的经典NMF、稀疏NMF分类以及监督分类（KNN和SVM）进行比较，我们证明了我们的算法具有更有效的分类结果。通过选择不同的变换来生成子空间以及不同的核聚类算法来对数据进行聚类，我们的算法可以在机器学习中生成一系列分类算法。