Atif Muhammad, Shafiq Muhammad, Leisch Friedrich
Department of Statistics, University of Peshawar, Peshawar, Pakistan.
Institute of Numerical Sciences, Kohat University of Science and Technology, Kohat, Pakistan.
J Appl Stat. 2021 Dec 7;50(4):1017-1035. doi: 10.1080/02664763.2021.2008882. eCollection 2023.
The clustering approach is widely accepted as the most prominent unsupervised learning problem in data mining techniques. This procedure deals with the identification of notable structures in unlabeled datasets. In modern days clustering of dynamic data, streams play a vital role in policy-making, and researchers are paying particular attention to monitoring the evolution of clustering solutions over time. The data streams evolve continually, and different sources generate data items over time. The clustering solution over this stream is not stationary and changes with the influx of new data items. This paper presents a comprehensive study of algorithms related to tracing the evolution of clusters over time in cumulative datasets. To demonstrate the applications and significance of the tracing cluster evolution, we implement the MONIC algorithm in R-software. This article illustrates how the data segmentation of dynamic streams is done and shows the applications of monitoring changes in clustering solutions with the help of real-life published datasets.
聚类方法作为数据挖掘技术中最突出的无监督学习问题而被广泛接受。该过程涉及在未标记数据集中识别显著结构。在现代动态数据聚类中,流在决策中起着至关重要的作用,研究人员特别关注随着时间推移监测聚类解决方案的演变。数据流不断演变,不同来源随时间生成数据项。此流上的聚类解决方案不是固定不变的,而是随着新数据项的涌入而变化。本文对与追踪累积数据集中聚类随时间演变相关的算法进行了全面研究。为了证明追踪聚类演变的应用和意义,我们在R软件中实现了MONIC算法。本文阐述了动态流的数据分割是如何进行的,并借助实际发布的数据集展示了监测聚类解决方案变化的应用。