Liang Shaoyi, Han Deqiang
MOE KLINNS Lab, Institute of Integrated Automation, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
Sensors (Basel). 2017 Sep 28;17(10):2226. doi: 10.3390/s17102226.
Closeness measures are crucial to clustering methods. In most traditional clustering methods, the closeness between data points or clusters is measured by the geometric distance alone. These metrics quantify the closeness only based on the concerned data points' positions in the feature space, and they might cause problems when dealing with clustering tasks having arbitrary clusters shapes and different clusters densities. In this paper, we first propose a novel Closeness Measure between data points based on the Neighborhood Chain (CMNC). Instead of using geometric distances alone, CMNC measures the closeness between data points by quantifying the difficulty for one data point to reach another through a chain of neighbors. Furthermore, based on CMNC, we also propose a clustering ensemble framework that combines CMNC and geometric-distance-based closeness measures together in order to utilize both of their advantages. In this framework, the "bad data points" that are hard to cluster correctly are identified; then different closeness measures are applied to different types of data points to get the unified clustering results. With the fusion of different closeness measures, the framework can get not only better clustering results in complicated clustering tasks, but also higher efficiency.
紧密性度量对于聚类方法至关重要。在大多数传统聚类方法中,数据点或聚类之间的紧密性仅通过几何距离来度量。这些度量仅基于相关数据点在特征空间中的位置来量化紧密性,并且在处理具有任意聚类形状和不同聚类密度的聚类任务时可能会导致问题。在本文中,我们首先基于邻域链提出了一种新颖的数据点之间的紧密性度量(CMNC)。CMNC不是仅使用几何距离,而是通过量化一个数据点通过一系列邻居到达另一个数据点的难度来度量数据点之间的紧密性。此外,基于CMNC,我们还提出了一个聚类集成框架,该框架将CMNC和基于几何距离的紧密性度量结合在一起,以便利用它们两者的优势。在这个框架中,识别出难以正确聚类的“不良数据点”;然后将不同的紧密性度量应用于不同类型的数据点以获得统一的聚类结果。通过融合不同的紧密性度量,该框架不仅可以在复杂的聚类任务中获得更好的聚类结果,而且还可以提高效率。