School of Computer Science, China University of Geosciences, Wuhan, 430074, People's Republic of China.
Institute of Cardiovascular Disease Research, Huai'an Second People's Hospital Affiliated to Xuzhou Medical College, Huai'an, 223002, People's Republic of China.
Med Biol Eng Comput. 2018 Jul;56(7):1271-1284. doi: 10.1007/s11517-017-1751-6. Epub 2017 Dec 19.
With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.
随着 DNA 微阵列技术的飞速发展,产生了大量的基因组数据。由于基因表达数据通常具有数千个基因,但样本数量较少,因此对这些微阵列数据进行分类是一项具有挑战性的任务。在本文中,提出了一种有效的基因选择方法,用于去除无关和冗余基因后选择微阵列数据的最佳基因子集。与原始数据相比,所选基因子集可以有益于分类任务。我们将基因选择任务表述为流形正则化子空间学习问题。具体来说,使用投影矩阵将原始高维微阵列数据投影到低维子空间中,同时通过在低维数据空间上的拉普拉斯图正则化项来保留原始数据的局部流形结构。投影矩阵可以作为不同基因重要性的指标。开发了一种迭代更新算法来解决该问题。在六个公开可用的微阵列数据集和一个临床数据集上的实验结果表明,与其他最先进的方法相比,该方法在微阵列数据分类方面表现更好。