Kim Hyune-Ju, Luo Jun, Kim Jeankyung, Chen Huann-Sheng, Feuer Eric J
Department of Mathematics, Syracuse University, Syracuse, NY, 13244, U.S.A.
Stat Med. 2014 Oct 15;33(23):4087-103. doi: 10.1002/sim.6221. Epub 2014 Jun 3.
In this paper, we propose methods to cluster groups of two-dimensional data whose mean functions are piecewise linear into several clusters with common characteristics such as the same slopes. To fit segmented line regression models with common features for each possible cluster, we use a restricted least squares method. In implementing the restricted least squares method, we estimate the maximum number of segments in each cluster by using both the permutation test method and the Bayes information criterion method and then propose to use the Bayes information criterion to determine the number of clusters. For a more effective implementation of the clustering algorithm, we propose a measure of the minimum distance worth detecting and illustrate its use in two examples. We summarize simulation results to study properties of the proposed methods and also prove the consistency of the cluster grouping estimated with a given number of clusters. The presentation and examples in this paper focus on the segmented line regression model with the ordered values of the independent variable, which has been the model of interest in cancer trend analysis, but the proposed method can be applied to a general model with design points either ordered or unordered.
在本文中,我们提出了一些方法,用于将均值函数为分段线性的二维数据组聚类为具有相同斜率等共同特征的几个簇。为了对每个可能的簇拟合具有共同特征的分段线性回归模型,我们使用了一种受限最小二乘法。在实施受限最小二乘法时,我们通过排列检验法和贝叶斯信息准则法估计每个簇中的最大段数,然后建议使用贝叶斯信息准则来确定簇的数量。为了更有效地实施聚类算法,我们提出了一种值得检测的最小距离度量,并在两个示例中说明了其用法。我们总结了模拟结果以研究所提方法的性质,并证明了给定簇数下估计的簇分组的一致性。本文中的介绍和示例主要关注自变量有序值的分段线性回归模型,该模型一直是癌症趋势分析中感兴趣的模型,但所提方法可应用于设计点有序或无序的一般模型。