Zhong Wenliang, Pan Weike, Kwok James T, Tsang Ivor W
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
IEEE Trans Neural Netw. 2010 Oct;21(10):1564-75. doi: 10.1109/TNN.2010.2064177. Epub 2010 Aug 23.
Clustering using the Hilbert Schmidt independence criterion (CLUHSIC) is a recent clustering algorithm that maximizes the dependence between cluster labels and data observations according to the Hilbert Schmidt independence criterion (HSIC). It is unique in that structure information on the cluster outputs can be easily utilized in the clustering process. However, while the choice of the loss function is known to be very important in supervised learning with structured outputs, we will show in this paper that CLUHSIC is implicitly using the often inappropriate zero-one loss. We propose an extension called CLUHSICAL (which stands for "Clustering using HSIC and loss") which explicitly considers both the output dependency and loss function. Its optimization problem has the same form as CLUHSIC, except that its partition matrix is constructed in a different manner. Experimental results on a number of datasets with structured outputs show that CLUHSICAL often outperforms CLUHSIC in terms of both structured loss and clustering accuracy.
使用希尔伯特-施密特独立性准则的聚类算法(CLUHSIC)是一种最新的聚类算法,它根据希尔伯特-施密特独立性准则(HSIC)使聚类标签与数据观测值之间的依赖性最大化。其独特之处在于聚类输出的结构信息能够在聚类过程中轻松得到利用。然而,虽然在具有结构化输出的监督学习中损失函数的选择被认为非常重要,但我们将在本文中表明,CLUHSIC隐含地使用了通常不合适的零一损失。我们提出了一种名为CLUHSICAL(代表“使用HSIC和损失的聚类”)的扩展算法,它明确地同时考虑了输出依赖性和损失函数。其优化问题与CLUHSIC具有相同的形式,只是其划分矩阵的构建方式不同。在多个具有结构化输出的数据集上的实验结果表明,CLUHSICAL在结构化损失和聚类准确性方面通常都优于CLUHSIC。