Ferraccioli Federico, Menardi Giovanna
Padua, Italy Dipartimento di Scienze Statistiche, Università degli Studi di Padova.
Adv Data Anal Classif. 2023;17(2):323-345. doi: 10.1007/s11634-022-00501-x. Epub 2022 May 5.
The nonparametric formulation of density-based clustering, known as modal clustering, draws a correspondence between groups and the attraction domains of the modes of the density function underlying the data. Its probabilistic foundation allows for a natural, yet not trivial, generalization of the approach to the matrix-valued setting, increasingly widespread, for example, in longitudinal and multivariate spatio-temporal studies. In this work we introduce nonparametric estimators of matrix-variate distributions based on kernel methods, and analyze their asymptotic properties. Additionally, we propose a generalization of the mean-shift procedure for the identification of the modes of the estimated density. Given the intrinsic high dimensionality of matrix-variate data, we discuss some locally adaptive solutions to handle the problem. We test the procedure via extensive simulations, also with respect to some competitors, and illustrate its performance through two high-dimensional real data applications.
基于密度的聚类的非参数公式,即模态聚类,在数据底层密度函数的模式吸引域与组之间建立了对应关系。其概率基础允许将该方法自然但并非平凡地推广到矩阵值设置,例如在纵向和多变量时空研究中越来越普遍。在这项工作中,我们基于核方法引入了矩阵变量分布的非参数估计器,并分析了它们的渐近性质。此外,我们提出了一种均值漂移过程的推广,用于识别估计密度的模式。鉴于矩阵变量数据固有的高维度,我们讨论了一些局部自适应解决方案来处理该问题。我们通过广泛的模拟对该过程进行了测试,也与一些竞争对手进行了比较,并通过两个高维实际数据应用说明了其性能。