基因表达的生物学有效线性因子模型。

Biologically valid linear factor models of gene expression.

作者信息

Girolami Mark, Breitling Rainer

机构信息

Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, UK.

出版信息

Bioinformatics. 2004 Nov 22;20(17):3021-33. doi: 10.1093/bioinformatics/bth354. Epub 2004 Jun 16.

DOI:10.1093/bioinformatics/bth354

PMID:15201181

Abstract

MOTIVATION

The identification of physiological processes underlying and generating the expression pattern observed in microarray experiments is a major challenge. Principal component analysis (PCA) is a linear multivariate statistical method that is regularly employed for that purpose as it provides a reduced-dimensional representation for subsequent study of possible biological processes responding to the particular experimental conditions. Making explicit the data assumptions underlying PCA highlights their lack of biological validity thus making biological interpretation of the principal components problematic. A microarray data representation which enables clear biological interpretation is a desirable analysis tool.

RESULTS

We address this issue by employing the probabilistic interpretation of PCA and proposing alternative linear factor models which are based on refined biological assumptions. A practical study on two well-understood microarray datasets highlights the weakness of PCA and the greater biological interpretability of the linear models we have developed.

摘要

动机

识别微阵列实验中观察到的表达模式背后并产生该模式的生理过程是一项重大挑战。主成分分析（PCA）是一种线性多变量统计方法，经常用于此目的，因为它提供了一种降维表示，以便后续研究可能响应特定实验条件的生物过程。明确PCA背后的数据假设凸显了其缺乏生物学有效性，从而使得对主成分进行生物学解释存在问题。一种能够进行清晰生物学解释的微阵列数据表示方式是一种理想的分析工具。