论证并推广对比散度。

Justifying and generalizing contrastive divergence.

作者信息

Bengio Yoshua, Delalleau Olivier

机构信息

Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec, Canada.

出版信息

Neural Comput. 2009 Jun;21(6):1601-21. doi: 10.1162/neco.2008.11-07-647.

DOI:10.1162/neco.2008.11-07-647

PMID:19018704

Abstract

We study an expansion of the log likelihood in undirected graphical models such as the restricted Boltzmann machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables (the visible vector and the hidden vector in RBMs). We are particularly interested in estimators of the gradient of the log likelihood obtained through this expansion. We show that its residual term converges to zero, justifying the use of a truncation--running only a short Gibbs chain, which is the main idea behind the contrastive divergence (CD) estimator of the log-likelihood gradient. By truncating even more, we obtain a stochastic reconstruction error, related through a mean-field approximation to the reconstruction error often used to train autoassociators and stacked autoassociators. The derivation is not specific to the particular parametric forms used in RBMs and requires only convergence of the Gibbs chain. We present theoretical and empirical evidence linking the number of Gibbs steps k and the magnitude of the RBM parameters to the bias in the CD estimator. These experiments also suggest that the sign of the CD estimator is correct most of the time, even when the bias is large, so that CD-k is a good descent direction even for small k.

摘要

我们研究无向图模型（如受限玻尔兹曼机，RBM）中对数似然的展开，其中展开式中的每一项都与吉布斯链中的一个样本相关联，该吉布斯链在两个随机变量（RBM中的可见向量和隐藏向量）之间交替。我们特别关注通过这种展开获得的对数似然梯度的估计量。我们证明其残差项收敛到零，这证明了使用截断（仅运行短的吉布斯链）的合理性，这是对数似然梯度的对比散度（CD）估计背后的主要思想。通过进一步截断，我们得到一个随机重构误差，它通过平均场近似与常用于训练自联想器和堆叠自联想器的重构误差相关。该推导并不特定于RBM中使用的特定参数形式，仅要求吉布斯链收敛。我们给出了理论和经验证据，将吉布斯步数k和RBM参数的大小与CD估计量中的偏差联系起来。这些实验还表明，即使偏差很大，CD估计量的符号在大多数情况下也是正确的，因此即使对于小的k，CD - k也是一个很好的下降方向。