Li Jianqiang, Wang Fei
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):514-521. doi: 10.1109/TCBB.2016.2591545. Epub 2016 Aug 29.
The recent development of microarray gene expression techniques have made it possible to offer phenotype classification of many diseases. However, in gene expression data analysis, each sample is represented by quite a large number of genes, and many of them are redundant or insignificant to clarify the disease problem. Therefore, how to efficiently select the most useful genes has been becoming one of the most hot research topics in the gene expression data analysis. In this paper, a novel unsupervised two-stage coarse-fine gene selection method is proposed. In the first stage, we apply the kmeans algorithm to over-cluster the genes and discard some redundant genes. In the second stage, we select the most representative genes from the remaining ones based on matrix factorization. Finally the experimental results on several data sets are presented to show the effectiveness of our method.
微阵列基因表达技术的最新发展使得对许多疾病进行表型分类成为可能。然而,在基因表达数据分析中,每个样本由相当多的基因表示,其中许多基因对于阐明疾病问题是冗余的或不重要的。因此,如何有效地选择最有用的基因一直是基因表达数据分析中最热门的研究课题之一。本文提出了一种新颖的无监督两阶段粗细基因选择方法。在第一阶段,我们应用kmeans算法对基因进行过度聚类并丢弃一些冗余基因。在第二阶段,我们基于矩阵分解从剩余基因中选择最具代表性的基因。最后给出了在几个数据集上的实验结果,以证明我们方法的有效性。