Wouters Luc, Göhlmann Hinrich W, Bijnens Luc, Kass Stefan U, Molenberghs Geert, Lewi Paul J
Center for Statistics, Limburgs Universitair Centrum, transnationale Universiteit Limburg, Universitaire Campus, gebouw D, B-3590 Diepenbeek, Belgium.
Biometrics. 2003 Dec;59(4):1131-9. doi: 10.1111/j.0006-341x.2003.00130.x.
This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.
本文介绍了三种多元投影方法,并利用白血病患者基因表达水平的实际数据,比较了它们识别生物样本和基因簇的能力。结果表明,主成分分析(PCA)的缺点是所得主因子信息量不大,而对应因子分析(CFA)在解释对象间距离方面存在困难。引入谱图分析(SMA)作为微阵列数据分析的替代方法。加权谱图分析在样本聚类以及识别与这些聚类相关的基因方面优于主成分分析,并且至少与对应因子分析一样强大。谱图分析比对应因子分析更适当地解决了微阵列实验中的数据分析问题,并允许对基因和样本进行更灵活的加权。适当的加权很重要,因为它能降低可靠性较低的数据的权重,并强调更可靠的信息。