Tarpey Thaddeus
Thaddeus Tarpey is Professor, Department of Mathematics and Statistics, Wright State University, Dayton, Ohio.
Am Stat. 2007 Feb;61(1):34-40. doi: 10.1198/000313007X171016.
Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.
通过将从个体曲线估计出的回归系数代入k均值算法,可以对功能数据进行聚类。聚类结果可能会因曲线与数据的拟合方式不同而有所差异。使用不同的基函数集估计曲线对应于数据的不同线性变换。k均值聚类对于数据的线性变换不是不变的。用于聚类的最优线性变换将拉伸分布,使变异性的主要方向与聚类中的实际差异对齐。结果表明,对原始数据进行聚类通常会得到与使用正交设计矩阵获得的回归系数聚类相似的结果。在函数空间上使用L(2)度量对功能数据进行聚类,可以通过对回归系数的适当线性变换进行聚类来实现。以抑郁症患者接受抗抑郁药治疗的例子进行说明。