Dipartimento di Matematica ed Informatica, University of Udine, Udine, Italy.
PLoS One. 2010 Dec 2;5(12):e14067. doi: 10.1371/journal.pone.0014067.
Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the arithmetic-harmonic cut. We show that the problem of finding such a cut is NP-hard and APX-hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as k-Means, Graclus and Normalized-Cut. The arithmetic-harmonic cut metric overcoming difficulties other hierarchical methods have in representing both intercluster differences and intracluster similarities.
聚类,特别是层次聚类,是一种在广泛的知识领域中理解和分析数据的重要方法,在数据可以在进化背景下分类的系统中具有显著的实用性。本文介绍了一个新的层次聚类问题,由我们称之为算术-调和切割的新目标函数定义。我们表明,找到这样一个切割的问题是 NP 难和 APX 难的,但却是固定参数可处理的,这表明尽管该问题不太可能有多项式时间算法(即使是近似的),但基于精确参数化和局部搜索的技术可能会产生可行的算法。为此,我们为该问题实现了一个遗传算法,并在包括癌症类型数据集和冠状病毒数据集在内的多个数据集上展示了算术调和切割的有效性。与 k-Means、Graclus 和 Normalized-Cut 等当前使用的层次聚类技术相比,我们显示出了有利的性能。算术调和切割度量克服了其他层次方法在表示簇间差异和簇内相似性方面的困难。