Mukhopadhyay Minerva, Li Didong, Dunson David B
Indian Institute of Technology, Kanpur, India.
Duke University, Durham, USA.
J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1249-1271. doi: 10.1111/rssb.12390. Epub 2020 Aug 9.
Current tools for multivariate density estimation struggle when the density is concentrated near a non-linear subspace or manifold. Most approaches require the choice of a kernel, with the multivariate Gaussian kernel by far the most commonly used. Although heavy-tailed and skewed extensions have been proposed, such kernels cannot capture curvature in the support of the data. This leads to poor performance unless the sample size is very large relative to the dimension of the data. The paper proposes a novel generalization of the Gaussian distribution, which includes an additional curvature parameter. We refer to the proposed class as Fisher-Gaussian kernels, since they arise by sampling from a von Mises-Fisher density on the sphere and adding Gaussian noise. The Fisher-Gaussian density has an analytic form and is amenable to straightforward implementation within Bayesian mixture models by using Markov chain Monte Carlo sampling. We provide theory on large support and illustrate gains relative to competitors in simulated and real data applications.
当密度集中在非线性子空间或流形附近时,当前用于多元密度估计的工具会遇到困难。大多数方法需要选择一个核,其中多元高斯核是迄今为止最常用的。尽管已经提出了重尾和偏态扩展,但这样的核无法捕捉数据支撑中的曲率。除非样本量相对于数据维度非常大,否则这会导致性能不佳。本文提出了高斯分布的一种新颖推广,其中包括一个额外的曲率参数。我们将所提出的类别称为费舍尔 - 高斯核,因为它们是通过从球面上的冯·米塞斯 - 费舍尔密度进行采样并添加高斯噪声而产生的。费舍尔 - 高斯密度具有解析形式,并且通过使用马尔可夫链蒙特卡罗采样,便于在贝叶斯混合模型中直接实现。我们提供了关于大支撑的理论,并在模拟和实际数据应用中说明了相对于竞争对手的优势。