Salem Semeh Ben, Naouali Sami, Chtourou Zied
Science and Technologies for Defense (STD) Laboratory, Military Academy of Fondouk Jedid, Nabeul, Tunisia.
Polytechnic School of Tunisia, Rue El Khawarizmi, Al Marsá, B.P. 743, 2078 Tunis, Tunisia.
Int J Mach Learn Cybern. 2021;12(7):2069-2090. doi: 10.1007/s13042-021-01293-w. Epub 2021 Mar 27.
The categorical clustering problem has attracted much attention especially in the last decades since many real world applications produce categorical data. The -mode algorithm, proposed since 1998, and its multiple variants were widely used in this context. However, they suffer from a great limitation related to the update of the modes in each iteration. The mode in the last step of these algorithms is randomly selected although it is possible to identify many candidate ones. In this paper, a rough density mode selection method is proposed to identify the adequate modes among a list of candidate ones in each iteration of the -modes. The proposed method, called Density Rough -Modes (DR-M) was experimented using real world datasets extracted from the UCI Machine Learning Repository, the Global Terrorism Database (GTD) and a set of collected Tweets. The DRk-M was also compared to many states of the art clustering methods and has shown great efficiency.
分类聚类问题尤其在过去几十年中受到了广泛关注,因为许多现实世界的应用都会产生分类数据。自1998年提出的-k模式算法及其多种变体在这种情况下被广泛使用。然而,它们存在一个与每次迭代中模式更新相关的重大局限性。这些算法最后一步中的模式是随机选择的,尽管有可能识别出许多候选模式。在本文中,提出了一种粗糙密度模式选择方法,以在-k模式的每次迭代中的候选模式列表中识别出合适的模式。所提出的方法称为密度粗糙-k模式(DRk-M),使用从UCI机器学习库、全球恐怖主义数据库(GTD)和一组收集的推文提取的真实世界数据集进行了实验。DRk-M还与许多现有聚类方法进行了比较,并显示出了很高的效率。