BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey, USA.
BMC Bioinformatics. 2010 May 26;11:279. doi: 10.1186/1471-2105-11-279.
Microarray technology is a powerful and widely accepted experimental technique in molecular biology that allows studying genome wide transcriptional responses. However, experimental data usually contain potential sources of uncertainty and thus many experiments are now designed with repeated measurements to better assess such inherent variability. Many computational methods have been proposed to account for the variability in replicates. As yet, there is no model to output expression profiles accounting for replicate information so that a variety of computational models that take the expression profiles as the input data can explore this information without any modification.
We propose a methodology which integrates replicate variability into expression profiles, to generate so-called 'true' expression profiles. The study addresses two issues: (i) develop a statistical model that can estimate 'true' expression profiles which are more robust than the average profile, and (ii) extend our previous micro-clustering which was designed specifically for clustering time-series expression data. The model utilizes a previously proposed error model and the concept of 'relative difference'. The clustering effectiveness is demonstrated through synthetic data where several methods are compared. We subsequently analyze in vivo rat data to elucidate circadian transcriptional dynamics as well as liver-specific corticosteroid induced changes in gene expression.
We have proposed a model which integrates the error information from repeated measurements into the expression profiles. Through numerous synthetic and real time-series data, we demonstrated the ability of the approach to improve the clustering performance and assist in the identification and selection of informative expression motifs.
微阵列技术是分子生物学中一种强大且被广泛接受的实验技术,可用于研究全基因组转录反应。然而,实验数据通常包含潜在的不确定来源,因此现在许多实验都设计了重复测量,以更好地评估这种固有变异性。已经提出了许多计算方法来解释重复测量中的变异性。到目前为止,还没有一种模型可以输出考虑到重复信息的表达谱,以便各种计算模型可以在不进行任何修改的情况下,利用这些表达谱作为输入数据来探索这些信息。
我们提出了一种将重复测量中的变异性纳入表达谱的方法,以生成所谓的“真实”表达谱。该研究解决了两个问题:(i)开发一种能够估计“真实”表达谱的统计模型,该模型比平均谱更稳健,以及(ii)扩展我们之前专门为时间序列表达数据聚类而设计的微聚类。该模型利用了先前提出的误差模型和“相对差异”的概念。通过比较几种方法,在合成数据中验证了聚类的有效性。随后,我们分析了体内大鼠数据,以阐明昼夜转录动力学以及肝脏特异性皮质甾酮诱导的基因表达变化。
我们提出了一种将重复测量中的误差信息纳入表达谱的模型。通过大量的合成和实时序列数据,我们证明了该方法能够提高聚类性能,并有助于识别和选择有意义的表达模式。