IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):243-55. doi: 10.1109/TPAMI.2014.2315802.
We propose the supervised hierarchical Dirichlet process (sHDP), a nonparametric generative model for the joint distribution of a group of observations and a response variable directly associated with that whole group. We compare the sHDP with another leading method for regression on grouped data, the supervised latent Dirichlet allocation (sLDA) model. We evaluate our method on two real-world classification problems and two real-world regression problems. Bayesian nonparametric regression models based on the Dirichlet process, such as the Dirichlet process-generalised linear models (DP-GLM) have previously been explored; these models allow flexibility in modelling nonlinear relationships. However, until now, hierarchical Dirichlet process (HDP) mixtures have not seen significant use in supervised problems with grouped data since a straightforward application of the HDP on the grouped data results in learnt clusters that are not predictive of the responses. The sHDP solves this problem by allowing for clusters to be learnt jointly from the group structure and from the label assigned to each group.
我们提出了有监督层次狄利克雷过程(sHDP),这是一种针对观测值组和与之直接相关的响应变量的联合分布的非参数生成模型。我们将 sHDP 与另一种用于分组数据回归的领先方法——有监督潜在狄利克雷分配(sLDA)模型进行了比较。我们在两个真实世界的分类问题和两个真实世界的回归问题上评估了我们的方法。基于狄利克雷过程的贝叶斯非参数回归模型,如狄利克雷过程-广义线性模型(DP-GLM),之前已经被探索过;这些模型允许对非线性关系进行灵活建模。然而,直到现在,层次狄利克雷过程(HDP)混合物在具有分组数据的监督问题中并没有得到广泛应用,因为在分组数据上直接应用 HDP 会导致学到的聚类对响应没有预测能力。sHDP 通过允许从组结构和每个组分配的标签共同学习聚类来解决这个问题。