Liebermeister Wolfram
Theoretische Biophysik, Institut für Biologie, Humboldt-Universität zu Berlin, Invalidenstrasse 42, 10115 Berlin, Germany.
Bioinformatics. 2002 Jan;18(1):51-60. doi: 10.1093/bioinformatics/18.1.51.
The expression of genes is controlled by specific combinations of cellular variables. We applied Independent Component Analysis (ICA) to gene expression data, deriving a linear model based on hidden variables, which we term 'expression modes'. The expression of each gene is a linear function of the expression modes, where, according to the ICA model, the linear influences of different modes show a minimal statistical dependence, and their distributions deviate sharply from the normal distribution.
Studying cell cycle-related gene expression in yeast, we found that the dominant expression modes could be related to distinct biological functions, such as phases of the cell cycle or the mating response. Analysis of human lymphocytes revealed modes that were related to characteristic differences between cell types. With both data sets, the linear influences of the dominant modes showed distributions with large tails, indicating the existence of specifically up- and downregulated target genes. The expression modes and their influences can be used to visualize the samples and genes in low-dimensional spaces. A projection to expression modes helps to highlight particular biological functions, to reduce noise, and to compress the data in a biologically sensible way.
基因的表达由细胞变量的特定组合控制。我们将独立成分分析(ICA)应用于基因表达数据,基于隐藏变量推导了一个线性模型,我们将其称为“表达模式”。每个基因的表达都是表达模式的线性函数,根据ICA模型,不同模式的线性影响显示出最小的统计依赖性,并且它们的分布与正态分布有很大偏差。
研究酵母中与细胞周期相关的基因表达时,我们发现主要的表达模式可能与不同的生物学功能相关,例如细胞周期的阶段或交配反应。对人类淋巴细胞的分析揭示了与细胞类型之间特征差异相关的模式。在这两个数据集上,主要模式的线性影响显示出具有长尾的分布,表明存在特异性上调和下调的靶基因。表达模式及其影响可用于在低维空间中可视化样本和基因。向表达模式的投影有助于突出特定的生物学功能、减少噪声,并以生物学上合理的方式压缩数据。