Girolami Mark, Breitling Rainer
Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, UK.
Bioinformatics. 2004 Nov 22;20(17):3021-33. doi: 10.1093/bioinformatics/bth354. Epub 2004 Jun 16.
The identification of physiological processes underlying and generating the expression pattern observed in microarray experiments is a major challenge. Principal component analysis (PCA) is a linear multivariate statistical method that is regularly employed for that purpose as it provides a reduced-dimensional representation for subsequent study of possible biological processes responding to the particular experimental conditions. Making explicit the data assumptions underlying PCA highlights their lack of biological validity thus making biological interpretation of the principal components problematic. A microarray data representation which enables clear biological interpretation is a desirable analysis tool.
We address this issue by employing the probabilistic interpretation of PCA and proposing alternative linear factor models which are based on refined biological assumptions. A practical study on two well-understood microarray datasets highlights the weakness of PCA and the greater biological interpretability of the linear models we have developed.
识别微阵列实验中观察到的表达模式背后并产生该模式的生理过程是一项重大挑战。主成分分析(PCA)是一种线性多变量统计方法,经常用于此目的,因为它提供了一种降维表示,以便后续研究可能响应特定实验条件的生物过程。明确PCA背后的数据假设凸显了其缺乏生物学有效性,从而使得对主成分进行生物学解释存在问题。一种能够进行清晰生物学解释的微阵列数据表示方式是一种理想的分析工具。
我们通过采用PCA的概率解释并提出基于更精细生物学假设的替代线性因子模型来解决这个问题。对两个充分理解的微阵列数据集进行的实际研究突出了PCA的弱点以及我们开发的线性模型具有更强的生物学可解释性。