基于边界轮廓的增量聚类方法。

An incremental clustering method based on the boundary profile.

机构信息

Department of Computer Science & Technology, Xi'an Jiaotong University, Xi'an, P.R. China.

China Xi'an Satellite Control Center, Xi'an, P.R. China.

出版信息

PLoS One. 2018 Apr 20;13(4):e0196108. doi: 10.1371/journal.pone.0196108. eCollection 2018.

DOI:10.1371/journal.pone.0196108

PMID:29677201

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5909898/

Abstract

Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is better to incrementally update the new clustering results based on the old data rather than to recluster all of the data from scratch. The incremental clustering approach is an essential way to solve the problem of clustering with growing Big Data. This paper proposes a boundary-profile-based incremental clustering (BPIC) method to find arbitrarily shaped clusters with dynamically growing datasets. This method represents the existing clustering results with a collection of boundary profiles and discards the inner points of clusters rather than keep all data. It greatly saves both time and space storage costs. To identify the boundary profile, this paper presents a boundary-vector-based boundary point detection (BV-BPD) algorithm that summarizes the structure of the existing clusters. The BPIC method processes each new point in an online fashion and updates the clustering results in a batch mode. When a new point arrives, the BPIC method either immediately labels it or temporarily puts it into a bucket according to the relationship between the new data and the boundary profiles. A bucket is employed to distinguish the noise from the potential seeds of new clusters and alleviate the effects of data order. When the bucket is full, the BPIC method will cluster the data within it and update the clustering results. Thus, the BPIC method is insensitive to noise and the order of new data, which is critical for the robustness of the incremental clustering process. In the experiments, the performance of the boundary point detection algorithm BV-BPD is compared with the state-of-the-art method. The results show that the BV-BPD is better than the state-of-the-art method. Additionally, the performance of BPIC and other two incremental clustering methods are investigated in terms of clustering quality, time and space efficiency. The experimental results indicate that the BPIC method is able to get a qualified clustering result on a large dataset with higher time and space efficiency.

摘要

许多重要的应用程序会持续生成数据，例如金融交易管理、卫星监控、网络流量监控和网络信息处理。数据挖掘结果会随着新生成的数据不断演变。显然，对于聚类任务，最好基于旧数据增量式地更新新的聚类结果，而不是从头开始重新聚类所有数据。增量聚类方法是解决大数据聚类问题的重要途径。本文提出了一种基于边界轮廓的增量聚类（BPIC）方法，用于在动态增长的数据集上发现任意形状的聚类。该方法使用一组边界轮廓来表示现有的聚类结果，并丢弃聚类的内部点，而不是保留所有数据，这大大节省了时间和空间存储成本。为了识别边界轮廓，本文提出了一种基于边界向量的边界点检测（BV-BPD）算法，用于总结现有聚类的结构。BPIC 方法以在线方式处理每个新点，并以批量模式更新聚类结果。当新点到达时，BPIC 方法根据新数据与边界轮廓的关系立即对其进行标记，或者暂时将其放入一个桶中。桶用于区分噪声和新聚类的潜在种子，并减轻数据顺序的影响。当桶满时，BPIC 方法将对桶内的数据进行聚类，并更新聚类结果。因此，BPIC 方法对噪声和新数据的顺序不敏感，这对于增量聚类过程的鲁棒性至关重要。在实验中，边界点检测算法 BV-BPD 的性能与最先进的方法进行了比较。结果表明，BV-BPD 优于最先进的方法。此外，还从聚类质量、时间和空间效率等方面研究了 BPIC 和其他两种增量聚类方法的性能。实验结果表明，BPIC 方法能够在具有更高时间和空间效率的大型数据集上获得合格的聚类结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/2010f7fea431/pone.0196108.g001.jpg

相似文献

An incremental clustering method based on the boundary profile.基于边界轮廓的增量聚类方法。

PLoS One. 2018 Apr 20;13(4):e0196108. doi: 10.1371/journal.pone.0196108. eCollection 2018.

An effective density-based clustering and dynamic maintenance framework for evolving medical data streams.用于演化型医疗数据流的有效基于密度的聚类和动态维护框架。

Int J Med Inform. 2019 Jun;126:176-186. doi: 10.1016/j.ijmedinf.2019.03.016. Epub 2019 Mar 28.

Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence.分布式双警戒模糊自适应共振理论在线学习，检索任意形状的聚类，并减轻顺序相关性。

Neural Netw. 2020 Jan;121:208-228. doi: 10.1016/j.neunet.2019.08.033. Epub 2019 Sep 9.

Solving text clustering problem using a memetic differential evolution algorithm.使用进化算法求解文本聚类问题。

PLoS One. 2020 Jun 11;15(6):e0232816. doi: 10.1371/journal.pone.0232816. eCollection 2020.

A Fast Projection-Based Algorithm for Clustering Big Data.一种基于快速投影的大数据聚类算法。

Interdiscip Sci. 2019 Sep;11(3):360-366. doi: 10.1007/s12539-018-0294-3. Epub 2018 Jun 7.

A differential privacy protecting K-means clustering algorithm based on contour coefficients.基于轮廓系数的差分隐私保护 K-均值聚类算法。

PLoS One. 2018 Nov 21;13(11):e0206832. doi: 10.1371/journal.pone.0206832. eCollection 2018.

A vector reconstruction based clustering algorithm particularly for large-scale text collection.基于向量重构的聚类算法，特别适用于大规模文本集。

Neural Netw. 2015 Mar;63:141-55. doi: 10.1016/j.neunet.2014.10.012. Epub 2014 Dec 9.

Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster.基于边界微簇快速剥离的高效在线流聚类

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5680-5693. doi: 10.1109/TNNLS.2024.3382033. Epub 2025 Feb 28.

Retro: concept-based clustering of biomedical topical sets.回溯：基于概念的生物医学主题集聚类。

Bioinformatics. 2014 Nov 15;30(22):3240-8. doi: 10.1093/bioinformatics/btu514. Epub 2014 Jul 29.

Incremental Interval Type-2 Fuzzy Clustering of Data Streams using Single Pass Method.使用单遍方法的数据流增量区间二型模糊聚类

Sensors (Basel). 2020 Jun 5;20(11):3210. doi: 10.3390/s20113210.

基于边界轮廓的增量聚类方法。

An incremental clustering method based on the boundary profile.

机构信息

Department of Computer Science & Technology, Xi'an Jiaotong University, Xi'an, P.R. China.

China Xi'an Satellite Control Center, Xi'an, P.R. China.

出版信息

PLoS One. 2018 Apr 20;13(4):e0196108. doi: 10.1371/journal.pone.0196108. eCollection 2018.

DOI:10.1371/journal.pone.0196108

PMID:29677201

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5909898/

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于边界轮廓的增量聚类方法。

An incremental clustering method based on the boundary profile.

机构信息

出版信息

相似文献

基于边界轮廓的增量聚类方法。

An incremental clustering method based on the boundary profile.

机构信息

出版信息

相似文献