• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动扫描:DBSCAN参数的自动检测以及重叠密度区域中数据的高效聚类

AutoSCAN: automatic detection of DBSCAN parameters and efficient clustering of data in overlapping density regions.

作者信息

Bushra Adil Abdu, Kim Dongyeon, Kan Yejin, Yi Gangman

机构信息

Department of Multimedia Engineering, Dongguk University, Seoul, South Korea.

Department of Artificial Intelligence, Dongguk University, Seoul, South Korea.

出版信息

PeerJ Comput Sci. 2024 Mar 14;10:e1921. doi: 10.7717/peerj-cs.1921. eCollection 2024.

DOI:10.7717/peerj-cs.1921
PMID:38660211
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11042006/
Abstract

The density-based clustering method is considered a robust approach in unsupervised clustering technique due to its ability to identify outliers, form clusters of irregular shapes and automatically determine the number of clusters. These unique properties helped its pioneering algorithm, the Density-based Spatial Clustering on Applications with Noise (DBSCAN), become applicable in datasets where various number of clusters of different shapes and sizes could be detected without much interference from the user. However, the original algorithm exhibits limitations, especially towards its sensitivity on its user input parameters minPts and . Additionally, the algorithm assigned inconsistent cluster labels to data objects found in overlapping density regions of separate clusters, hence lowering its accuracy. To alleviate these specific problems and increase the clustering accuracy, we propose two methods that use the statistical data from a given dataset's k-nearest neighbor density distribution in order to determine the optimal values. Our approach removes the burden on the users, and automatically detects the clusters of a given dataset. Furthermore, a method to identify the accurate border objects of separate clusters is proposed and implemented to solve the unpredictability of the original algorithm. Finally, in our experiments, we show that our efficient re-implementation of the original algorithm to automatically cluster datasets and improve the clustering quality of adjoining cluster members provides increase in clustering accuracy and faster running times when compared to earlier approaches.

摘要

基于密度的聚类方法在无监督聚类技术中被认为是一种强大的方法,因为它能够识别异常值、形成不规则形状的聚类并自动确定聚类的数量。这些独特的特性有助于其开创性算法——带噪声的基于密度的空间聚类应用(DBSCAN)——适用于各种数据集,在这些数据集中,可以检测到不同形状和大小的各种数量的聚类,而无需用户过多干预。然而,原始算法存在局限性,特别是对用户输入参数minPts和 的敏感性。此外,该算法为在单独聚类的重叠密度区域中发现的数据对象分配了不一致的聚类标签,从而降低了其准确性。为了缓解这些特定问题并提高聚类准确性,我们提出了两种方法,这些方法使用给定数据集的k近邻密度分布的统计数据来确定最佳 值。我们的方法减轻了用户的负担,并自动检测给定数据集的聚类。此外,还提出并实现了一种识别单独聚类的准确边界对象的方法,以解决原始算法的不可预测性。最后,在我们的实验中,我们表明,与早期方法相比,我们对原始算法的高效重新实现能够自动对数据集进行聚类并提高相邻聚类成员的聚类质量,从而提高聚类准确性并加快运行时间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/8f800e250172/peerj-cs-10-1921-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/64f30235ca2c/peerj-cs-10-1921-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/15950d581cfd/peerj-cs-10-1921-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/95c80047f948/peerj-cs-10-1921-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/f6729b3e0842/peerj-cs-10-1921-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/8f800e250172/peerj-cs-10-1921-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/64f30235ca2c/peerj-cs-10-1921-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/15950d581cfd/peerj-cs-10-1921-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/95c80047f948/peerj-cs-10-1921-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/f6729b3e0842/peerj-cs-10-1921-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f61/11042006/8f800e250172/peerj-cs-10-1921-g005.jpg

相似文献

1
AutoSCAN: automatic detection of DBSCAN parameters and efficient clustering of data in overlapping density regions.自动扫描:DBSCAN参数的自动检测以及重叠密度区域中数据的高效聚类
PeerJ Comput Sci. 2024 Mar 14;10:e1921. doi: 10.7717/peerj-cs.1921. eCollection 2024.
2
Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets.使用人工数据集的名为FA+GA-DBSCAN的聚类混合算法的性能分析与架构
Entropy (Basel). 2022 Jun 25;24(7):875. doi: 10.3390/e24070875.
3
Quantum algorithm for MMNG-based DBSCAN.基于MMNG的DBSCAN的量子算法。
Sci Rep. 2021 Jul 30;11(1):15559. doi: 10.1038/s41598-021-95156-7.
4
An Improved DBSCAN Method for LiDAR Data Segmentation with Automatic Eps Estimation.基于自动 eps 估计的激光雷达数据分段的改进 DBSCAN 方法。
Sensors (Basel). 2019 Jan 5;19(1):172. doi: 10.3390/s19010172.
5
Adaptive Density Spatial Clustering Method Fusing Chameleon Swarm Algorithm.融合变色龙群算法的自适应密度空间聚类方法
Entropy (Basel). 2023 May 11;25(5):782. doi: 10.3390/e25050782.
6
DSets-DBSCAN: A Parameter-Free Clustering Algorithm.DSets-DBSCAN:一种无参数聚类算法。
IEEE Trans Image Process. 2016 Jul;25(7):3182-3193. doi: 10.1109/TIP.2016.2559803.
7
Fiber-distance-based unsupervised clustering of MR tractography data.基于纤维距离的磁共振束追踪数据无监督聚类。
J Neurosci Methods. 2019 Sep 1;325:108361. doi: 10.1016/j.jneumeth.2019.108361. Epub 2019 Jul 20.
8
Fast clustering algorithm based on MST of representative points.基于代表点最小生成树的快速聚类算法。
Math Biosci Eng. 2023 Jul 31;20(9):15830-15858. doi: 10.3934/mbe.2023705.
9
A density-based segmentation for 3D images, an application for X-ray micro-tomography.基于密度的三维图像分割,X 射线微断层扫描的应用。
Anal Chim Acta. 2012 May 6;725:14-21. doi: 10.1016/j.aca.2012.03.008. Epub 2012 Mar 15.
10
An effective density-based clustering and dynamic maintenance framework for evolving medical data streams.用于演化型医疗数据流的有效基于密度的聚类和动态维护框架。
Int J Med Inform. 2019 Jun;126:176-186. doi: 10.1016/j.ijmedinf.2019.03.016. Epub 2019 Mar 28.

引用本文的文献

1
Optics-free Spatial Genomics for Mapping Mouse Brain Aging.用于绘制小鼠脑衰老图谱的无光学空间基因组学
bioRxiv. 2024 Aug 8:2024.08.06.606712. doi: 10.1101/2024.08.06.606712.

本文引用的文献

1
Clustering benchmark datasets exploiting the fundamental clustering problems.利用基本聚类问题的聚类基准数据集。
Data Brief. 2020 Apr 20;30:105501. doi: 10.1016/j.dib.2020.105501. eCollection 2020 Jun.
2
Machine learning. Clustering by fast search and find of density peaks.机器学习。基于密度峰值的快速搜索和发现的聚类。
Science. 2014 Jun 27;344(6191):1492-6. doi: 10.1126/science.1242072.