Kim Hyunwoo J, Xu Jia, Vemuri Baba C, Singh Vikas
University of Wisconsin-Madison, Madison, WI 53706, USA.
University of Florida, Gainesville, FL 32611, USA.
JMLR Workshop Conf Proc. 2015 Jul;2015:1199-1208.
Statistical models for manifold-valued data permit capturing the intrinsic nature of the curved spaces in which the data lie and have been a topic of research for several decades. Typically, these formulations use geodesic curves and distances defined for most cases - this makes it hard to design parametric models on smooth manifolds. Thus, most (manifold specific) parametric models available today assume that the data lie in a small neighborhood on the manifold. To address this 'locality' problem, we propose a novel nonparametric model which unifies multivariate general linear models (MGLMs) using multiple tangent spaces. Our framework generalizes existing work on (both Euclidean and non-Euclidean) general linear models providing a recipe to globally extend the locally-defined parametric models (using a mixture of local models). By grouping observations into sub-populations at multiple tangent spaces, our method provides insights into the hidden structure (geodesic relationships) in the data. This yields a framework to group observations and discover geodesic relationships between covariates and manifold-valued responses , which we call Dirichlet process mixtures of multivariate general linear models (DP-MGLM) on Riemannian manifolds. Finally, we present proof of concept experiments to validate our model.
用于多值数据的统计模型能够捕捉数据所在弯曲空间的内在本质,并且几十年来一直是研究的主题。通常,这些公式使用在大多数情况下定义的测地线曲线和距离——这使得在光滑流形上设计参数模型变得困难。因此,如今大多数现有的(特定于流形的)参数模型都假设数据位于流形上的一个小邻域内。为了解决这个“局部性”问题,我们提出了一种新颖的非参数模型,该模型使用多个切空间统一了多元广义线性模型(MGLM)。我们的框架推广了关于(欧几里得和非欧几里得)广义线性模型的现有工作,提供了一种全局扩展局部定义的参数模型的方法(使用局部模型的混合)。通过在多个切空间将观测值分组为子总体,我们的方法揭示了数据中隐藏的结构(测地关系)。这产生了一个用于对观测值进行分组并发现协变量与多值响应之间测地关系的框架,我们将其称为黎曼流形上的多元广义线性模型的狄利克雷过程混合(DP - MGLM)。最后,我们展示概念验证实验以验证我们的模型。