Salzman Peter, Almudevar Anthony
University of Rochester.
Stat Appl Genet Mol Biol. 2006;5:Article21. doi: 10.2202/1544-6115.1208. Epub 2006 Aug 31.
Statistical inference of graphical models has become an important tool in the reconstruction of biological networks of the type which model, for example, gene regulatory interactions. In particular, the construction of a score-based Bayesian posterior density over the space of models provides an intuitive and computationally feasible method of assessing model uncertainty and of assigning statistical confidence to structural features. One problem which frequently occurs with this approach is the tendency to overestimate the degree of model complexity. Spurious graphical features obtained in this way may affect the inference in unpredictable ways, even when using scoring techniques, such as the Bayesian Information Criterion (BIC), that are specifically designed to compensate for overfitting. In this article we propose a simple adjustment to a BIC-based scoring procedure. The method proceeds in two steps. In the first step we derive an independent estimate of the parametric complexity of the model. In the second we modify the BIC score so that the mean parametric complexity of the posterior density is equal to the estimated value. The method is applied to a set of test networks, and to a collection of genes from the yeast genome known to possess regulatory relationships. A Bayesian network model with binary responses is employed. In the examples considered, we find that the number of spurious graph edges inferred is reduced, while the effect on the identification of true edges is minimal.
图形模型的统计推断已成为重建生物网络的重要工具,这类生物网络可对例如基因调控相互作用等进行建模。特别是,在模型空间上构建基于得分的贝叶斯后验密度,为评估模型不确定性以及为结构特征赋予统计置信度提供了一种直观且计算上可行的方法。这种方法经常出现的一个问题是倾向于高估模型复杂度。以这种方式获得的虚假图形特征可能会以不可预测的方式影响推断,即使使用专门设计用于补偿过拟合的评分技术,如贝叶斯信息准则(BIC)。在本文中,我们提出了一种对基于BIC的评分程序的简单调整。该方法分两步进行。第一步,我们推导出模型参数复杂度的独立估计值。第二步,我们修改BIC分数,以使后验密度的平均参数复杂度等于估计值。该方法应用于一组测试网络以及来自酵母基因组中已知具有调控关系的一组基因。采用具有二元响应的贝叶斯网络模型。在考虑的示例中,我们发现推断出的虚假图形边的数量减少了,而对真实边识别的影响最小。