Suppr超能文献

基于边界轮廓的增量聚类方法。

An incremental clustering method based on the boundary profile.

机构信息

Department of Computer Science & Technology, Xi'an Jiaotong University, Xi'an, P.R. China.

China Xi'an Satellite Control Center, Xi'an, P.R. China.

出版信息

PLoS One. 2018 Apr 20;13(4):e0196108. doi: 10.1371/journal.pone.0196108. eCollection 2018.

Abstract

Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is better to incrementally update the new clustering results based on the old data rather than to recluster all of the data from scratch. The incremental clustering approach is an essential way to solve the problem of clustering with growing Big Data. This paper proposes a boundary-profile-based incremental clustering (BPIC) method to find arbitrarily shaped clusters with dynamically growing datasets. This method represents the existing clustering results with a collection of boundary profiles and discards the inner points of clusters rather than keep all data. It greatly saves both time and space storage costs. To identify the boundary profile, this paper presents a boundary-vector-based boundary point detection (BV-BPD) algorithm that summarizes the structure of the existing clusters. The BPIC method processes each new point in an online fashion and updates the clustering results in a batch mode. When a new point arrives, the BPIC method either immediately labels it or temporarily puts it into a bucket according to the relationship between the new data and the boundary profiles. A bucket is employed to distinguish the noise from the potential seeds of new clusters and alleviate the effects of data order. When the bucket is full, the BPIC method will cluster the data within it and update the clustering results. Thus, the BPIC method is insensitive to noise and the order of new data, which is critical for the robustness of the incremental clustering process. In the experiments, the performance of the boundary point detection algorithm BV-BPD is compared with the state-of-the-art method. The results show that the BV-BPD is better than the state-of-the-art method. Additionally, the performance of BPIC and other two incremental clustering methods are investigated in terms of clustering quality, time and space efficiency. The experimental results indicate that the BPIC method is able to get a qualified clustering result on a large dataset with higher time and space efficiency.

摘要

许多重要的应用程序会持续生成数据,例如金融交易管理、卫星监控、网络流量监控和网络信息处理。数据挖掘结果会随着新生成的数据不断演变。显然,对于聚类任务,最好基于旧数据增量式地更新新的聚类结果,而不是从头开始重新聚类所有数据。增量聚类方法是解决大数据聚类问题的重要途径。本文提出了一种基于边界轮廓的增量聚类(BPIC)方法,用于在动态增长的数据集上发现任意形状的聚类。该方法使用一组边界轮廓来表示现有的聚类结果,并丢弃聚类的内部点,而不是保留所有数据,这大大节省了时间和空间存储成本。为了识别边界轮廓,本文提出了一种基于边界向量的边界点检测(BV-BPD)算法,用于总结现有聚类的结构。BPIC 方法以在线方式处理每个新点,并以批量模式更新聚类结果。当新点到达时,BPIC 方法根据新数据与边界轮廓的关系立即对其进行标记,或者暂时将其放入一个桶中。桶用于区分噪声和新聚类的潜在种子,并减轻数据顺序的影响。当桶满时,BPIC 方法将对桶内的数据进行聚类,并更新聚类结果。因此,BPIC 方法对噪声和新数据的顺序不敏感,这对于增量聚类过程的鲁棒性至关重要。在实验中,边界点检测算法 BV-BPD 的性能与最先进的方法进行了比较。结果表明,BV-BPD 优于最先进的方法。此外,还从聚类质量、时间和空间效率等方面研究了 BPIC 和其他两种增量聚类方法的性能。实验结果表明,BPIC 方法能够在具有更高时间和空间效率的大型数据集上获得合格的聚类结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/2010f7fea431/pone.0196108.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验