Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstraße 43, D-10115 Berlin, Germany.
BMC Bioinformatics. 2013 Apr 21;14:133. doi: 10.1186/1471-2105-14-133.
The transcriptomes of several cyanobacterial strains have been shown to exhibit diurnal oscillation patterns, reflecting the diurnal phototrophic lifestyle of the organisms. The analysis of such genome-wide transcriptional oscillations is often facilitated by the use of clustering algorithms in conjunction with a number of pre-processing steps. Biological interpretation is usually focussed on the time and phase of expression of the resulting groups of genes. However, the use of microarray technology in such studies requires the normalization of pre-processing data, with unclear impact on the qualitative and quantitative features of the derived information on the number of oscillating transcripts and their respective phases.
A microarray based evaluation of diurnal expression in the cyanobacterium Synechocystis sp. PCC 6803 is presented. As expected, the temporal expression patterns reveal strong oscillations in transcript abundance. We compare the Fourier transformation-based expression phase before and after the application of quantile normalization, median polishing, cyclical LOESS, and least oscillating set (LOS) normalization. Whereas LOS normalization mostly preserves the phases of the raw data, the remaining methods introduce systematic biases. In particular, quantile-normalization is found to introduce a phase-shift of 180°, effectively changing night-expressed genes into day-expressed ones. Comparison of a large number of clustering results of differently normalized data shows that the normalization method determines the result. Subsequent steps, such as the choice of data transformation, similarity measure, and clustering algorithm, only play minor roles. We find that the standardization and the DTF transformation are favorable for the clustering of time series in contrast to the 12 m transformation. We use the cluster-wise functional enrichment of a clustering derived by LOS normalization, clustering using flowClust, and DFT transformation to derive the diurnal biological program of Synechocystis sp..
Application of quantile normalization, median polishing, and also cyclic LOESS normalization of the presented cyanobacterial dataset lead to increased numbers of oscillating genes and the systematic shift of the expression phase. The LOS normalization minimizes the observed detrimental effects. As previous analyses employed a variety of different normalization methods, a direct comparison of results must be treated with caution.
已经证明,几种蓝藻菌株的转录组表现出昼夜振荡模式,反映了生物体的昼夜光合作用生活方式。通过使用聚类算法结合许多预处理步骤,通常可以分析这种全基因组转录振荡。生物解释通常集中在产生的基因组的表达时间和相位上。然而,在这些研究中使用微阵列技术需要对预处理数据进行标准化,这对衍生信息中振荡转录本的数量及其各自相位的定性和定量特征的影响尚不清楚。
呈现了基于微阵列的蓝藻集胞藻 PCC 6803 昼夜表达的评估。正如预期的那样,时间表达模式显示转录物丰度的强烈振荡。我们比较了应用分位数归一化、中位数抛光、循环 LOESS 和最少振荡集 (LOS) 归一化前后基于傅里叶变换的表达相位。虽然 LOS 归一化主要保留了原始数据的相位,但其余方法会引入系统偏差。特别是,分位数归一化被发现会引入 180°的相移,有效地将夜间表达的基因转变为白天表达的基因。比较大量不同归一化数据的聚类结果表明,归一化方法决定了结果。随后的步骤,如数据转换、相似性度量和聚类算法的选择,只起次要作用。我们发现与 12 m 转换相比,标准化和 DTF 转换有利于时间序列的聚类。我们使用 LOS 归一化、使用 flowClust 的聚类和 DFT 转换的聚类中每类的功能富集,推导出集胞藻昼夜生物程序。
对所提供的蓝藻数据集应用分位数归一化、中位数抛光和循环 LOESS 归一化会导致振荡基因数量的增加和表达相位的系统移位。LOS 归一化将观察到的不利影响降到最低。由于以前的分析使用了多种不同的归一化方法,因此必须谨慎对待结果的直接比较。