Lu Chi-Ken, Shafto Patrick
Mathematics and Computer Science, Rutgers University, Newark, NJ 07102, USA.
School of Mathematics, Institute for Advanced Studies, Princeton, NJ 08540, USA.
Entropy (Basel). 2021 Oct 23;23(11):1387. doi: 10.3390/e23111387.
It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success as a deep network used for feature extraction. Then, a GP was used as the function model. Recently, it was suggested that, albeit training with marginal likelihood, the deterministic nature of a feature extractor might lead to overfitting, and replacement with a Bayesian network seemed to cure it. Here, we propose the conditional deep Gaussian process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. It follows our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.
期望在一个具有表现力的贝叶斯学习模型中将深度学习的表达能力与高斯过程(GP)相结合。深度核学习作为一种用于特征提取的深度网络取得了成功。然后,将GP用作函数模型。最近有人提出,尽管使用边际似然进行训练,但特征提取器的确定性本质可能会导致过拟合,而用贝叶斯网络替代似乎可以解决这个问题。在这里,我们提出了条件深度高斯过程(DGP),其中层次组合中的中间GP由超数据支持,而外露的GP保持零均值。受稀疏GP中诱导点的启发,超数据也起到了函数支撑的作用,但它们是超参数而非随机变量。它遵循我们之前的矩匹配方法,用带有有效核的GP来近似条件DGP的边际先验。因此,如同在经验贝叶斯中一样,通过优化近似边际似然来学习超数据,该近似边际似然通过核隐式地依赖于超数据。我们展示了在潜在空间中密集超数据的极限情况下与深度核学习的等价性。然而,条件DGP和相应的近似推理比深度核学习更具贝叶斯性质。初步的外推结果表明,与GP核组合、DGP变分推理和深度核学习相比,通过利用精确的协方差和超数据学习,层次深度具有表达能力。我们还讨论了我们模型的非高斯方面以及升级到全贝叶斯推理的方法。