Department of Statistics, Florida State University, Tallahassee, Florida, USA.
Biometrics. 2022 Sep;78(3):1067-1079. doi: 10.1111/biom.13486. Epub 2021 May 26.
In the form of multidimensional arrays, tensor data have become increasingly prevalent in modern scientific studies and biomedical applications such as computational biology, brain imaging analysis, and process monitoring system. These data are intrinsically heterogeneous with complex dependencies and structure. Therefore, ad-hoc dimension reduction methods on tensor data may lack statistical efficiency and can obscure essential findings. Model-based clustering is a cornerstone of multivariate statistics and unsupervised learning; however, existing methods and algorithms are not designed for tensor-variate samples. In this article, we propose a tensor envelope mixture model (TEMM) for simultaneous clustering and multiway dimension reduction of tensor data. TEMM incorporates tensor-structure-preserving dimension reduction into mixture modeling and drastically reduces the number of free parameters and estimative variability. An expectation-maximization-type algorithm is developed to obtain likelihood-based estimators of the cluster means and covariances, which are jointly parameterized and constrained onto a series of lower dimensional subspaces known as the tensor envelopes. We demonstrate the encouraging empirical performance of the proposed method in extensive simulation studies and a real data application in comparison with existing vector and tensor clustering methods.
张量数据以多维数组的形式在现代科学研究和生物医学应用中变得越来越普遍,如计算生物学、脑成像分析和过程监测系统。这些数据本质上具有复杂的依赖关系和结构,具有异质性。因此,张量数据的特定于任务的降维方法可能缺乏统计效率,并可能掩盖重要的发现。基于模型的聚类是多元统计和无监督学习的基石;然而,现有的方法和算法不是为张量变量样本设计的。在本文中,我们提出了一种张量包络混合模型(TEMM),用于张量数据的同时聚类和多向降维。TEMM 将张量结构保持的降维纳入混合建模中,并大大减少了自由参数的数量和估计的可变性。开发了一种期望最大化类型的算法,以获得基于似然的聚类均值和协方差的估计值,这些均值和协方差被联合参数化,并约束在一系列称为张量包络的较低维子空间上。与现有的向量和张量聚类方法相比,我们通过广泛的模拟研究和实际数据应用证明了所提出方法的令人鼓舞的经验性能。