Athens Information Technology, Paiania, Greece.
IEEE Trans Pattern Anal Mach Intell. 2011 Feb;33(2):279-93. doi: 10.1109/TPAMI.2010.85.
We present a novel optimization-based method for the combination of cluster ensembles for the class of problems with intracluster criteria, such as Minimum-Sum-of-Squares-Clustering (MSSC). We propose a simple and efficient algorithm-called EXAMCE-for this class of problems that is inspired from a Set-Partitioning formulation of the original clustering problem. We prove some theoretical properties of the solutions produced by our algorithm, and in particular that, under general assumptions, though the algorithm recombines solution fragments so as to find the solution of a Set-Covering relaxation of the original formulation, it is guaranteed to find better solutions than the ones in the ensemble. For the MSSC problem in particular, a prototype implementation of our algorithm found a new better solution than the previously best known for 21 of the test instances of the 40-instance TSPLIB benchmark data sets used in [1], [2], and [3], and found a worse-quality solution than the best known only five times. For other published benchmark data sets where the optimal MSSC solution is known, we match them. The algorithm is particularly effective when the number of clusters is large, in which case it is able to escape the local minima found by K-means type algorithms by recombining the solutions in a Set-Covering context. We also establish the stability of the algorithm with extensive computational experiments, by showing that multiple runs of EXAMCE for the same clustering problem instance produce high-quality solutions whose Adjusted Rand Index is consistently above 0.95. Finally, in experiments utilizing external criteria to compute the validity of clustering, EXAMCE is capable of producing high-quality results that are comparable in quality to those of the best known clustering algorithms.
我们提出了一种新的基于优化的方法,用于组合具有簇内标准的簇集成,例如最小平方和聚类(MSSC)。我们为这类问题提出了一种简单而有效的算法——EXAMCE,它的灵感来自于原始聚类问题的集合划分公式。我们证明了我们算法产生的解的一些理论性质,特别是在一般假设下,尽管算法通过重新组合解决方案片段来寻找原始公式的集合覆盖松弛的解决方案,但它保证可以找到比集成中更好的解决方案。对于 MSSC 问题,我们的算法原型实现找到了比 [1]、[2] 和 [3] 中使用的 40 个 TSPLIB 基准数据集的 21 个测试实例中以前最好的已知解决方案更好的新解决方案,并且仅五次找到了比最佳已知解决方案质量差的解决方案。对于其他已发布的最佳 MSSC 解决方案已知的基准数据集,我们与之匹配。当聚类的数量很大时,该算法特别有效,因为它能够通过在集合覆盖上下文中重新组合解决方案来避免 K-均值类型算法找到的局部最小值。我们还通过展示 EXAMCE 对相同聚类问题实例的多次运行可以产生高质量的解决方案,其调整兰德指数始终高于 0.95,从而在广泛的计算实验中证明了算法的稳定性。最后,在利用外部标准计算聚类有效性的实验中,EXAMCE 能够生成高质量的结果,其质量与最佳已知聚类算法相当。