Dai Jian J, Lieu Linh, Rocke David
University of California, Davis, USA.
Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.
An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. A five-step assessment procedure is designed for the purpose. Predictive accuracy and computational efficiency of the methods are examined. Two gene expression data sets for tumor classification are used in the study.
基因表达微阵列数据的一个重要应用是生物样本的分类或临床及其他结果的预测。此类应用中多元统计分析的一个必要部分是降维。本文对三种降维技术进行了比较研究,即偏最小二乘法(PLS)、切片逆回归(SIR)和主成分分析(PCA),并评估了纳入这些方法的分类程序的相对性能。为此设计了一个五步评估程序。研究了这些方法的预测准确性和计算效率。本研究使用了两个用于肿瘤分类的基因表达数据集。