Libbrecht Maxwell W, Hoffman Michael M, Bilmes Jeffrey A, Noble William S
Genome Sciences, Box 355065, Foege Building, S220B, 3720 15th Ave NE, Seattle, WA 98195-5065.
Princess Margaret Cancer Centre, Toronto Medical Discovery Tower 11-311, 101 College St, Toronto, ON M5G 1L7.
Proc Int Conf Mach Learn. 2015 Jul;37:1992-2001.
Graph smoothness objectives have achieved great success in semi-supervised learning but have not yet been applied extensively to unsupervised generative models. We define a new class of entropic graph-based posterior regularizers that augment a probabilistic model by encouraging pairs of nearby variables in a regularization graph to have similar posterior distributions. We present a three-way alternating optimization algorithm with closed-form updates for performing inference on this joint model and learning its parameters. This method admits updates linear in the degree of the regularization graph, exhibits monotone convergence, and is easily parallelizable. We are motivated by applications in computational biology in which temporal models such as hidden Markov models are used to learn a human-interpretable representation of genomic data. On a synthetic problem, we show that our method outperforms existing methods for graph-based regularization and a comparable strategy for incorporating long-range interactions using existing methods for approximate inference. Using genome-scale functional genomics data, we integrate genome 3D interaction data into existing models for genome annotation and demonstrate significant improvements in predicting genomic activity.
图平滑目标在半监督学习中取得了巨大成功,但尚未广泛应用于无监督生成模型。我们定义了一类新的基于熵的图后验正则化器,通过鼓励正则化图中相邻变量对具有相似的后验分布来增强概率模型。我们提出了一种具有闭式更新的三向交替优化算法,用于对这个联合模型进行推理并学习其参数。该方法允许在正则化图的度上进行线性更新,表现出单调收敛,并且易于并行化。我们的动机来自于计算生物学中的应用,其中诸如隐马尔可夫模型等时间模型用于学习基因组数据的人类可解释表示。在一个合成问题上,我们表明我们的方法优于现有的基于图的正则化方法以及使用现有近似推理方法纳入长程相互作用的可比策略。使用基因组规模的功能基因组学数据,我们将基因组三维相互作用数据整合到现有的基因组注释模型中,并证明在预测基因组活性方面有显著改进。