Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria.
PLoS One. 2013 Jul 23;8(7):e68358. doi: 10.1371/journal.pone.0068358. Print 2013.
Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined--there are more parameters than data points--and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks.
从表达数据中推断基因调控网络是困难的,但这是常见且经常有用的。大多数网络问题都是欠定的——参数比数据点多,因此通常需要减少数据或参数集。模型中变量之间的相关性也会影响网络系数推断。在本文中,我们提出了一种算法,该算法使用集成的概率聚类在完全贝叶斯框架中缓解欠定和相关变量的问题。具体来说,我们的算法是一个具有集成高斯混合聚类的动态贝叶斯网络,我们使用变分贝叶斯方法对其进行拟合。我们使用 DREAM4 挑战赛的公开模拟时程数据集展示,在许多情况下(25 个中的 7 个),我们的算法比非聚类方法需要更少的样本,很少表现不佳(25 个中的 1 个),并且如果它更好地描述数据,通常会选择非聚类模型。BAyesian Clustering Over Networks (BACON) 的源代码(GNU Octave)和示例数据可在:http://code.google.com/p/bacon-for-genetic-networks 获得。