Chang Jianlong, Meng Gaofeng, Wang Lingfeng, Xiang Shiming, Pan Chunhong
IEEE Trans Pattern Anal Mach Intell. 2020 Apr;42(4):809-823. doi: 10.1109/TPAMI.2018.2889949. Epub 2018 Dec 27.
Clustering is a crucial but challenging task in pattern analysis and machine learning. Existing methods often ignore the combination between representation learning and clustering. To tackle this problem, we reconsider the clustering task from its definition to develop Deep Self-Evolution Clustering (DSEC) to jointly learn representations and cluster data. For this purpose, the clustering task is recast as a binary pairwise-classification problem to estimate whether pairwise patterns are similar. Specifically, similarities between pairwise patterns are defined by the dot product between indicator features which are generated by a deep neural network (DNN). To learn informative representations for clustering, clustering constraints are imposed on the indicator features to represent specific concepts with specific representations. Since the ground-truth similarities are unavailable in clustering, an alternating iterative algorithm called Self-Evolution Clustering Training (SECT) is presented to select similar and dissimilar pairwise patterns and to train the DNN alternately. Consequently, the indicator features tend to be one-hot vectors and the patterns can be clustered by locating the largest response of the learned indicator features. Extensive experiments strongly evidence that DSEC outperforms current models on twelve popular image, text and audio datasets consistently.
聚类是模式分析和机器学习中的一项关键但具有挑战性的任务。现有方法往往忽视了表示学习与聚类之间的结合。为了解决这个问题,我们从聚类任务的定义重新审视,开发了深度自进化聚类(DSEC),以联合学习表示和对数据进行聚类。为此,聚类任务被重新表述为一个二元成对分类问题,以估计成对模式是否相似。具体而言,成对模式之间的相似性由深度神经网络(DNN)生成的指示特征之间的点积定义。为了学习用于聚类的信息表示,对指示特征施加聚类约束,以便用特定表示来表示特定概念。由于在聚类中真实相似性不可用,提出了一种称为自进化聚类训练(SECT)的交替迭代算法,用于选择相似和不相似的成对模式,并交替训练DNN。因此,指示特征趋向于成为独热向量,并且可以通过定位学习到的指示特征的最大响应来对模式进行聚类。大量实验有力地证明,DSEC在十二个流行的图像、文本和音频数据集上始终优于当前模型。