Gray-Davies Tristan, Holmes Chris C, Caron François
Department of Statistics, University of Oxford.
Electron J Stat. 2016 Jul 18;10(2):1807-1828. doi: 10.1214/15-EJS1032.
We present a novel Bayesian nonparametric regression model for covariates and continuous response variable ∈ ℝ. The model is parametrized in terms of marginal distributions for and and a regression function which tunes the stochastic ordering of the conditional distributions (). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates.
我们提出了一种针对协变量和连续响应变量 (y \in \mathbb{R}) 的新型贝叶斯非参数回归模型。该模型通过 (y) 和 (x) 的边际分布以及调整条件分布 (F(y|x)) 的随机排序的回归函数进行参数化。通过采用近似复合似然方法,我们表明所得的后验推断可以针对模型的各个组件解耦。此过程可以扩展到非常大的数据集,并允许使用来自贝叶斯非参数密度估计和Plackett-Luce排序估计的标准现有软件。作为示例,我们展示了我们的方法在一个拥有超过130万个数据点和100多个协变量的美国人口普查数据集上的应用。