Asif H M Shahzad, Sanguinetti Guido
Stat Appl Genet Mol Biol. 2013 Oct 1;12(5):545-57. doi: 10.1515/sagmb-2012-0010.
We present a novel method for simultaneous inference and nonparametric clustering of transcriptional dynamics from gene expression data. The proposed method uses gene expression data to infer time-varying TF profiles and cluster these temporal profiles according to the dynamics they exhibit. We use the latent structure of factorial hidden Markov model to model the transcription factor profiles as Markov chains and cluster these profiles using nonparametric mixture modeling. An efficient Gibbs sampling scheme is proposed for inference of latent variables and grouping of transcriptional dynamics into a priori unknown number of clusters. We test our model on simulated data and analyse its performance on two expression datasets; S. cerevisiae cell cycle data and E. coli oxygen starvation response data. Our results show the applicability of the method for genome wide analysis of expression data.
我们提出了一种从基因表达数据中同时进行转录动力学推断和非参数聚类的新方法。所提出的方法利用基因表达数据来推断随时间变化的转录因子谱,并根据它们所展现的动力学对这些时间谱进行聚类。我们使用因子隐马尔可夫模型的潜在结构将转录因子谱建模为马尔可夫链,并使用非参数混合建模对这些谱进行聚类。提出了一种有效的吉布斯采样方案,用于潜在变量的推断以及将转录动力学分组为事先未知数量的簇。我们在模拟数据上测试了我们的模型,并分析了其在两个表达数据集上的性能;酿酒酵母细胞周期数据和大肠杆菌氧饥饿反应数据。我们的结果表明了该方法在全基因组表达数据分析中的适用性。