IEEE Trans Neural Netw Learn Syst. 2013 Jul;24(7):1166-73. doi: 10.1109/TNNLS.2013.2247058.
The existence of multiple solutions in clustering, and in hierarchical clustering in particular, is often ignored in practical applications. However, this is a non-trivial problem, as different data orderings can result in different cluster sets that, in turns, may lead to different interpretations of the same data. The method presented here offers a solution to this issue. It is based on the definition of an equivalence relation over dendrograms that allows developing all and only the significantly different dendrograms for the same dataset, thus reducing the computational complexity to polynomial from the exponential obtained when all possible dendrograms are considered. Experimental results in the neuroimaging and bioinformatics domains show the effectiveness of the proposed method.
聚类中存在多个解决方案,特别是在层次聚类中,这在实际应用中经常被忽略。然而,这是一个不可忽视的问题,因为不同的数据排序可能会导致不同的聚类集,这反过来又可能导致对同一数据的不同解释。这里提出的方法提供了一个解决方案。它基于在树状图上定义一个等价关系,允许为同一数据集开发所有且仅开发显著不同的树状图,从而将计算复杂度从考虑所有可能的树状图时的指数级降低到多项式级。神经影像学和生物信息学领域的实验结果表明了所提出方法的有效性。