Liu Aiyi, Zhang Ying, Gehan Edmund, Clarke Robert
Biostatistics Unit, Lombardi Cancer Center, Georgetown University Medical Center, 3800 Reservoir Road, NW, Washington, DC 20007, USA.
Stat Med. 2002 Nov 30;21(22):3465-74. doi: 10.1002/sim.1263.
We propose a block principal component analysis method for extracting information from a database with a large number of variables and a relatively small number of subjects, such as a microarray gene expression database. This new procedure has the advantage of computational simplicity, and theory and numerical results demonstrate it to be as efficient as the ordinary principal component analysis when used for dimension reduction, variable selection and data visualization and classification. The method is illustrated with the well-known National Cancer Institute database of 60 human cancer cell lines data (NCI60) of gene microarray expressions, in the context of classification of cancer cell lines.
我们提出了一种块主成分分析方法,用于从具有大量变量和相对少量样本的数据库(如微阵列基因表达数据库)中提取信息。这一新方法具有计算简便的优点,理论和数值结果表明,在用于降维、变量选择、数据可视化及分类时,它与普通主成分分析一样有效。该方法通过著名的美国国立癌症研究所的60个人类癌细胞系基因微阵列表达数据(NCI60)数据库进行说明,应用于癌细胞系的分类。