Aksac Alper, Ozyer Tansel, Alhajj Reda
Department of Computer Science, University of Calgary, Calgary, AB, Canada.
TOBB University of Economics and Technology, Ankara, Turkey.
Data Brief. 2019 Nov 29;28:104899. doi: 10.1016/j.dib.2019.104899. eCollection 2020 Feb.
Cluster analysis plays a significant role regarding automating such a knowledge discovery process in spatial data mining. A good clustering algorithm supports two essential conditions, namely high intra-cluster similarity and low inter-cluster similarity. Maximized intra-cluster/within-cluster similarity produces low distances between data points inside the same cluster. However, minimized inter-cluster/between-cluster similarity increases the distance between data points in different clusters by furthering them apart from each other. We previously presented a spatial clustering algorithm, abbreviated CutESC (Cut-Edge for Spatial Clustering) with a graph-based approach. The data presented in this article is related to and supportive to the research paper entitled "CutESC: Cutting edge spatial clustering technique based on proximity graphs" (Aksac et al., 2019) [1], where interpretation research data presented here is available. In this article, we share the parametric version of our algorithm named CutESC-P, the best parameter settings for the experiments, the additional analyses and some additional information related to the proposed algorithm (CutESC) in [1].
聚类分析在空间数据挖掘中实现这种知识发现过程的自动化方面发挥着重要作用。一个好的聚类算法支持两个基本条件,即高类内相似度和低类间相似度。最大化的类内相似度会使同一聚类内的数据点之间的距离变小。然而,最小化的类间相似度会通过使不同聚类中的数据点相互远离来增加它们之间的距离。我们之前提出了一种空间聚类算法,简称为CutESC(用于空间聚类的割边算法),采用基于图的方法。本文所呈现的数据与题为《CutESC:基于邻近图的前沿空间聚类技术》(阿克萨克等人,2019年)[1]的研究论文相关且对其有支持作用,该论文中提供了此处所呈现的解释性研究数据。在本文中,我们分享了我们算法的参数版本CutESC-P、实验的最佳参数设置、额外的分析以及与[1]中所提出的算法(CutESC)相关的一些额外信息。