• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于大规模数据的基于快速粒球的密度峰值聚类算法

A Fast Granular-Ball-Based Density Peaks Clustering Algorithm for Large-Scale Data.

作者信息

Cheng Dongdong, Li Ya, Xia Shuyin, Wang Guoyin, Huang Jinlong, Zhang Sulan

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17202-17215. doi: 10.1109/TNNLS.2023.3300916. Epub 2024 Dec 2.

DOI:10.1109/TNNLS.2023.3300916
PMID:37566496
Abstract

Density peaks clustering algorithm (DP) has difficulty in clustering large-scale data, because it requires the distance matrix to compute the density and -distance for each object, which has time complexity. Granular ball (GB) is a coarse-grained representation of data. It is based on the fact that an object and its local neighbors have similar distribution and they have high possibility of belonging to the same class. It has been introduced into supervised learning by Xia et al. to improve the efficiency of supervised learning, such as support vector machine, -nearest neighbor classification, rough set, etc. Inspired by the idea of GB, we introduce it into unsupervised learning for the first time and propose a GB-based DP algorithm, called GB-DP. First, it generates GBs from the original data with an unsupervised partitioning method. Then, it defines the density of GBs, instead of the density of objects, according to the centers, radius, and distances between its members and centers, without setting any parameters. After that, it computes the distance between the centers of GBs as the distance between GBs and defines the -distance of GBs. Finally, it uses GBs' density and -distance to plot the decision graph, employs DP algorithm to cluster them, and expands the clustering result to the original data. Since there is no need to calculate the distance between any two objects and the number of GBs is far less than the scale of a data, it greatly reduces the running time of DP algorithm. By comparing with -means, ball -means, DP, DPC-KNN-PCA, FastDPeak, and DLORE-DP, GB-DP can get similar or even better clustering results in much less running time without setting any parameters. The source code is available at https://github.com/DongdongCheng/GB-DP.

摘要

密度峰值聚类算法(DP)在对大规模数据进行聚类时存在困难,因为它需要距离矩阵来计算每个对象的密度和距离,这具有时间复杂度。粒度球(GB)是数据的一种粗粒度表示。它基于这样一个事实,即一个对象及其局部邻域具有相似的分布,并且它们很有可能属于同一类。Xia等人已将其引入监督学习中,以提高监督学习的效率,如支持向量机、最近邻分类、粗糙集等。受GB思想的启发,我们首次将其引入无监督学习,并提出了一种基于GB的DP算法,称为GB-DP。首先,它使用无监督划分方法从原始数据中生成GB。然后,它根据GB的中心、半径及其成员与中心之间的距离来定义GB的密度,而不是对象的密度,无需设置任何参数。之后,它计算GB中心之间的距离作为GB之间的距离,并定义GB的距离。最后,它使用GB的密度和距离来绘制决策图,采用DP算法对它们进行聚类,并将聚类结果扩展到原始数据。由于无需计算任意两个对象之间的距离,且GB的数量远小于数据规模,因此大大减少了DP算法的运行时间。通过与均值法、球均值法、DP、DPC-KNN-PCA、FastDPeak和DLORE-DP进行比较,GB-DP在不设置任何参数的情况下,能够在更短的运行时间内获得相似甚至更好的聚类结果。源代码可在https://github.com/DongdongCheng/GB-DP获取。

相似文献

1
A Fast Granular-Ball-Based Density Peaks Clustering Algorithm for Large-Scale Data.一种用于大规模数据的基于快速粒球的密度峰值聚类算法
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17202-17215. doi: 10.1109/TNNLS.2023.3300916. Epub 2024 Dec 2.
2
K-Means Clustering With Natural Density Peaks for Discovering Arbitrary-Shaped Clusters.用于发现任意形状聚类的基于自然密度峰值的K均值聚类
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):11077-11090. doi: 10.1109/TNNLS.2023.3248064. Epub 2024 Aug 5.
3
RCDPeaks: memory-efficient density peaks clustering of long molecular dynamics.RCDPeaks:长分子动力学的内存高效密度峰聚类。
Bioinformatics. 2022 Mar 28;38(7):1863-1869. doi: 10.1093/bioinformatics/btac021.
4
GBCT: Efficient and Adaptive Clustering via Granular-Ball Computing for Complex Data.
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12159-12172. doi: 10.1109/TNNLS.2024.3497174.
5
A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.一种使用k均值聚类和三角不等式进行高维搜索的快速精确k近邻算法。
Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.
6
A novel density peaks clustering algorithm for automatic selection of clustering centers based on K-nearest neighbors.一种基于K近邻的用于自动选择聚类中心的新型密度峰值聚类算法。
Math Biosci Eng. 2023 May 10;20(7):11875-11894. doi: 10.3934/mbe.2023528.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
An Improved Density Peak Clustering Algorithm for Multi-Density Data.一种改进的多密度数据密度峰值聚类算法。
Sensors (Basel). 2022 Nov 15;22(22):8814. doi: 10.3390/s22228814.
9
Generation of Granular-Balls for Clustering Based on the Principle of Justifiable Granularity.
IEEE Trans Cybern. 2025 Apr;55(4):1687-1700. doi: 10.1109/TCYB.2025.3534195. Epub 2025 Mar 21.
10
Granular Ball Sampling for Noisy Label Classification or Imbalanced Classification.用于噪声标签分类或不平衡分类的粒度球采样
IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):2144-2155. doi: 10.1109/TNNLS.2021.3105984. Epub 2023 Apr 4.

引用本文的文献

1
Bridged Azobenzene Exhibits Fully Reversible Photocontrolled Binding to a G‑Quadruplex DNA/Duplex Junction.桥连偶氮苯对G-四链体DNA/双链体连接体表现出完全可逆的光控结合。
JACS Au. 2025 Aug 7;5(8):3846-3857. doi: 10.1021/jacsau.5c00532. eCollection 2025 Aug 25.
2
Application of PSO-integrated K-means algorithm in resident digital portrait classification.粒子群优化集成K均值算法在住院患者数字画像分类中的应用
PLoS One. 2025 Aug 14;20(8):e0329123. doi: 10.1371/journal.pone.0329123. eCollection 2025.