Liu Weixiang, Yuan Kehong, Ye Datian
Research Center of Biomedical Engineering, Life Science Division, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.
J Biomed Inform. 2008 Aug;41(4):602-6. doi: 10.1016/j.jbi.2007.12.003. Epub 2007 Dec 23.
In microarray data analysis, each gene expression sample has thousands of genes and reducing such high dimensionality is useful for both visualization and further clustering of samples. Traditional principal component analysis (PCA) is a commonly used method which has problems. Nonnegative Matrix Factorization (NMF) is a new dimension reduction method. In this paper we compare NMF and PCA for dimension reduction. The reduced data is used for visualization, and clustering analysis via k-means on 11 real gene expression datasets. Before the clustering analysis, we apply NMF and PCA for reduction in visualization. The results on one leukemia dataset show that NMF can discover natural clusters and clearly detect one mislabeled sample while PCA cannot. For clustering analysis via k-means, NMF most typically outperforms PCA. Our results demonstrate the superiority of NMF over PCA in reducing microarray data.
在微阵列数据分析中,每个基因表达样本都包含数千个基因,降低这种高维度对于样本的可视化和进一步聚类都很有用。传统的主成分分析(PCA)是一种常用方法,但存在问题。非负矩阵分解(NMF)是一种新的降维方法。在本文中,我们比较了NMF和PCA用于降维的情况。降维后的数据用于可视化,并通过k均值算法对11个真实基因表达数据集进行聚类分析。在聚类分析之前,我们应用NMF和PCA进行可视化降维。一个白血病数据集的结果表明,NMF可以发现自然聚类并清晰地检测出一个错误标记的样本,而PCA则不能。对于通过k均值算法进行的聚类分析,NMF通常比PCA表现更优。我们的结果证明了NMF在降低微阵列数据维度方面优于PCA。