Tan Yongxi, Shi Leming, Tong Weida, Wang Charles
Department of Medicine, Cedars-Sinai Medical Center, David Geffen School of Medicine UCLA, Los Angeles, CA 90048, USA.
Nucleic Acids Res. 2005 Jan 7;33(1):56-65. doi: 10.1093/nar/gki144. Print 2005.
DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.).
DNA微阵列技术通过同时监测数千个基因的表达水平,为在全基因组范围内进行肿瘤的诊断和预后评估提供了一种很有前景的方法。使用微阵列数据产生的一个问题是难以分析高维基因表达数据,通常这些数据有成千个变量(基因)和少得多的观测值(样本),并且经常观察到严重的共线性。这使得直接应用经典统计方法来研究微阵列数据变得困难。在本文中,提出了全主成分回归(TPCR)方法,通过从自变量和因变量的增强子空间中提取微阵列数据背后的潜在变量结构来对人类肿瘤进行分类。我们方法的一个显著特点是它不仅考虑了潜在变量结构,还考虑了微阵列基因表达谱(自变量)中的误差。使用四个著名的微阵列数据集,通过留一法和留半法交叉验证对TPCR的预测性能进行了评估。通过重新随机化和置换研究进一步评估了分类模型的稳定性和可靠性。应用了一种快速核算法以显著减少计算时间。(可根据要求提供MATLAB源代码。)