Zhou Junyi, Zhang Ying, Tu Wanzhu
Department of Biostatistics and Health Data Science, Indiana University.
Department of Biostatistics, University of Nebraska Medical Center.
J Comput Graph Stat. 2023;32(3):1131-1144. doi: 10.1080/10618600.2022.2149540. Epub 2023 Jan 12.
Longitudinal data clustering is challenging because the grouping has to account for the similarity of individual trajectories in the presence of sparse and irregular times of observation. This paper puts forward a hierarchical agglomerative clustering method based on a dissimilarity metric that quantifies the cost of merging two distinct groups of curves, which are depicted by -splines for the repeatedly measured data. Extensive simulations show that the proposed method has superior performance in determining the number of clusters, classifying individuals into the correct clusters, and in computational efficiency. Importantly, the method is not only suitable for clustering multivariate longitudinal data with sparse and irregular measurements but also for intensely measured functional data. Towards this end, we provide an R package for the implementation of such analyses. To illustrate the use of the proposed clustering method, two large clinical data sets from real-world clinical studies are analyzed.
纵向数据聚类具有挑战性,因为在观测时间稀疏且不规则的情况下进行分组时,必须考虑个体轨迹的相似性。本文提出了一种基于差异度量的层次凝聚聚类方法,该差异度量量化了合并两组不同曲线的成本,对于重复测量的数据,这些曲线由样条表示。大量模拟表明,该方法在确定聚类数量、将个体正确分类到聚类中以及计算效率方面具有卓越性能。重要的是,该方法不仅适用于对具有稀疏和不规则测量的多变量纵向数据进行聚类,也适用于密集测量的函数型数据。为此,我们提供了一个用于实现此类分析的R包。为了说明所提出聚类方法的使用,我们分析了来自真实世界临床研究的两个大型临床数据集。