• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于边界轮廓的增量聚类方法。

An incremental clustering method based on the boundary profile.

机构信息

Department of Computer Science & Technology, Xi'an Jiaotong University, Xi'an, P.R. China.

China Xi'an Satellite Control Center, Xi'an, P.R. China.

出版信息

PLoS One. 2018 Apr 20;13(4):e0196108. doi: 10.1371/journal.pone.0196108. eCollection 2018.

DOI:10.1371/journal.pone.0196108
PMID:29677201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5909898/
Abstract

Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is better to incrementally update the new clustering results based on the old data rather than to recluster all of the data from scratch. The incremental clustering approach is an essential way to solve the problem of clustering with growing Big Data. This paper proposes a boundary-profile-based incremental clustering (BPIC) method to find arbitrarily shaped clusters with dynamically growing datasets. This method represents the existing clustering results with a collection of boundary profiles and discards the inner points of clusters rather than keep all data. It greatly saves both time and space storage costs. To identify the boundary profile, this paper presents a boundary-vector-based boundary point detection (BV-BPD) algorithm that summarizes the structure of the existing clusters. The BPIC method processes each new point in an online fashion and updates the clustering results in a batch mode. When a new point arrives, the BPIC method either immediately labels it or temporarily puts it into a bucket according to the relationship between the new data and the boundary profiles. A bucket is employed to distinguish the noise from the potential seeds of new clusters and alleviate the effects of data order. When the bucket is full, the BPIC method will cluster the data within it and update the clustering results. Thus, the BPIC method is insensitive to noise and the order of new data, which is critical for the robustness of the incremental clustering process. In the experiments, the performance of the boundary point detection algorithm BV-BPD is compared with the state-of-the-art method. The results show that the BV-BPD is better than the state-of-the-art method. Additionally, the performance of BPIC and other two incremental clustering methods are investigated in terms of clustering quality, time and space efficiency. The experimental results indicate that the BPIC method is able to get a qualified clustering result on a large dataset with higher time and space efficiency.

摘要

许多重要的应用程序会持续生成数据,例如金融交易管理、卫星监控、网络流量监控和网络信息处理。数据挖掘结果会随着新生成的数据不断演变。显然,对于聚类任务,最好基于旧数据增量式地更新新的聚类结果,而不是从头开始重新聚类所有数据。增量聚类方法是解决大数据聚类问题的重要途径。本文提出了一种基于边界轮廓的增量聚类(BPIC)方法,用于在动态增长的数据集上发现任意形状的聚类。该方法使用一组边界轮廓来表示现有的聚类结果,并丢弃聚类的内部点,而不是保留所有数据,这大大节省了时间和空间存储成本。为了识别边界轮廓,本文提出了一种基于边界向量的边界点检测(BV-BPD)算法,用于总结现有聚类的结构。BPIC 方法以在线方式处理每个新点,并以批量模式更新聚类结果。当新点到达时,BPIC 方法根据新数据与边界轮廓的关系立即对其进行标记,或者暂时将其放入一个桶中。桶用于区分噪声和新聚类的潜在种子,并减轻数据顺序的影响。当桶满时,BPIC 方法将对桶内的数据进行聚类,并更新聚类结果。因此,BPIC 方法对噪声和新数据的顺序不敏感,这对于增量聚类过程的鲁棒性至关重要。在实验中,边界点检测算法 BV-BPD 的性能与最先进的方法进行了比较。结果表明,BV-BPD 优于最先进的方法。此外,还从聚类质量、时间和空间效率等方面研究了 BPIC 和其他两种增量聚类方法的性能。实验结果表明,BPIC 方法能够在具有更高时间和空间效率的大型数据集上获得合格的聚类结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/4eb7b2c573de/pone.0196108.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/2010f7fea431/pone.0196108.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/080fd8b06213/pone.0196108.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/5cca64e9fa8b/pone.0196108.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/d2606a635bc7/pone.0196108.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/21fc97a91b26/pone.0196108.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/5b0cc11b4d6b/pone.0196108.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/ee10ab7ae138/pone.0196108.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/e3effa09f346/pone.0196108.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/00debef790c9/pone.0196108.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/4eb7b2c573de/pone.0196108.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/2010f7fea431/pone.0196108.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/080fd8b06213/pone.0196108.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/5cca64e9fa8b/pone.0196108.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/d2606a635bc7/pone.0196108.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/21fc97a91b26/pone.0196108.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/5b0cc11b4d6b/pone.0196108.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/ee10ab7ae138/pone.0196108.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/e3effa09f346/pone.0196108.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/00debef790c9/pone.0196108.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d17/5909898/4eb7b2c573de/pone.0196108.g010.jpg

相似文献

1
An incremental clustering method based on the boundary profile.基于边界轮廓的增量聚类方法。
PLoS One. 2018 Apr 20;13(4):e0196108. doi: 10.1371/journal.pone.0196108. eCollection 2018.
2
An effective density-based clustering and dynamic maintenance framework for evolving medical data streams.用于演化型医疗数据流的有效基于密度的聚类和动态维护框架。
Int J Med Inform. 2019 Jun;126:176-186. doi: 10.1016/j.ijmedinf.2019.03.016. Epub 2019 Mar 28.
3
Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence.分布式双警戒模糊自适应共振理论在线学习,检索任意形状的聚类,并减轻顺序相关性。
Neural Netw. 2020 Jan;121:208-228. doi: 10.1016/j.neunet.2019.08.033. Epub 2019 Sep 9.
4
Solving text clustering problem using a memetic differential evolution algorithm.使用进化算法求解文本聚类问题。
PLoS One. 2020 Jun 11;15(6):e0232816. doi: 10.1371/journal.pone.0232816. eCollection 2020.
5
A Fast Projection-Based Algorithm for Clustering Big Data.一种基于快速投影的大数据聚类算法。
Interdiscip Sci. 2019 Sep;11(3):360-366. doi: 10.1007/s12539-018-0294-3. Epub 2018 Jun 7.
6
A differential privacy protecting K-means clustering algorithm based on contour coefficients.基于轮廓系数的差分隐私保护 K-均值聚类算法。
PLoS One. 2018 Nov 21;13(11):e0206832. doi: 10.1371/journal.pone.0206832. eCollection 2018.
7
A vector reconstruction based clustering algorithm particularly for large-scale text collection.基于向量重构的聚类算法,特别适用于大规模文本集。
Neural Netw. 2015 Mar;63:141-55. doi: 10.1016/j.neunet.2014.10.012. Epub 2014 Dec 9.
8
Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster.基于边界微簇快速剥离的高效在线流聚类
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5680-5693. doi: 10.1109/TNNLS.2024.3382033. Epub 2025 Feb 28.
9
Retro: concept-based clustering of biomedical topical sets.回溯:基于概念的生物医学主题集聚类。
Bioinformatics. 2014 Nov 15;30(22):3240-8. doi: 10.1093/bioinformatics/btu514. Epub 2014 Jul 29.
10
Incremental Interval Type-2 Fuzzy Clustering of Data Streams using Single Pass Method.使用单遍方法的数据流增量区间二型模糊聚类
Sensors (Basel). 2020 Jun 5;20(11):3210. doi: 10.3390/s20113210.