Du Mingjing, Wu Fuyu
School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China.
Entropy (Basel). 2022 Nov 4;24(11):1606. doi: 10.3390/e24111606.
Clustering can be divided into five categories: partitioning, hierarchical, model-based, density-based, and grid-based algorithms. Among them, grid-based clustering is highly efficient in handling spatial data. However, the traditional grid-based clustering algorithms still face many problems: (1) Parameter tuning: density thresholds are difficult to adjust; (2) Data challenge: clusters with overlapping regions and varying densities are not well handled. We propose a new grid-based clustering algorithm named GCBD that can solve the above problems. Firstly, the density estimation of nodes is defined using the standard grid structure. Secondly, GCBD uses an iterative boundary detection strategy to distinguish core nodes from boundary nodes. Finally, two clustering strategies are combined to group core nodes and assign boundary nodes. Experiments on 18 datasets demonstrate that the proposed algorithm outperforms 6 grid-based competitors.
划分算法、层次算法、基于模型的算法、基于密度的算法和基于网格的算法。其中,基于网格的聚类在处理空间数据方面效率很高。然而,传统的基于网格的聚类算法仍然面临许多问题:(1)参数调整:密度阈值难以调整;(2)数据挑战:具有重叠区域和不同密度的簇处理效果不佳。我们提出了一种名为GCBD的新的基于网格的聚类算法,它可以解决上述问题。首先,使用标准网格结构定义节点的密度估计。其次,GCBD使用迭代边界检测策略将核心节点与边界节点区分开来。最后,结合两种聚类策略对核心节点进行分组并分配边界节点。在18个数据集上进行的实验表明,该算法优于6种基于网格的竞争算法。