Park Seongoh, Lim Johan, Choi Hyejeong, Kwak Minjung
Department of Statistics, Seoul National University, Seoul, Korea.
Department of Statistics, Yeungnam University, Gyeongsan, Korea.
J Appl Stat. 2019 Nov 17;47(10):1739-1756. doi: 10.1080/02664763.2019.1692795. eCollection 2020.
We consider the clustering of repeatedly measured 'min-max' type interval-valued data. We read the data as matrix variate data and assume the covariance matrix is separable for the model-based clustering (M-clustering). The use of a separable covariance matrix introduces several advantages in M-clustering, which include fewer samples required for a valid procedure. In addition, the numerical study shows that this structured matrix allows us to find the correct number of clusters more accurately compared to other commonly assumed covariance matrices. We apply the M-clustering with various covariance structures to clustering the longitudinal blood pressure data from the National Heart, Lung, and Blood Institute Growth and Health Study (NGHS).
我们考虑对重复测量的“最小-最大”型区间值数据进行聚类。我们将数据视为矩阵变量数据,并假设协方差矩阵对于基于模型的聚类(M-聚类)是可分离的。在M-聚类中使用可分离协方差矩阵带来了几个优点,其中包括有效程序所需的样本更少。此外,数值研究表明,与其他通常假设的协方差矩阵相比,这种结构化矩阵使我们能够更准确地找到正确的聚类数量。我们将具有各种协方差结构的M-聚类应用于对美国国立心肺血液研究所生长与健康研究(NGHS)中的纵向血压数据进行聚类。