• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自适应密度峰值检测的快速聚类

Fast clustering using adaptive density peak detection.

作者信息

Wang Xiao-Feng, Xu Yifan

机构信息

1 Department of Quantitative Health Sciences/Biostatistics Section, Cleveland Clinic Lerner Research Institute, Cleveland, OH, USA.

2 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.

出版信息

Stat Methods Med Res. 2017 Dec;26(6):2800-2811. doi: 10.1177/0962280215609948. Epub 2015 Oct 16.

DOI:10.1177/0962280215609948
PMID:26475830
Abstract

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

摘要

聚类方法的常见局限性包括算法收敛速度慢、许多内在参数预设定的不稳定性以及对异常值缺乏鲁棒性。最近的一种聚类方法提出了一种基于局部密度的聚类中心快速搜索算法。然而,该算法中关键内在参数的选择并未得到系统研究。由于算法中局部密度的原始定义基于截断计数测度,因此估计“最优”参数相对困难。在本文中,我们提出了一种具有自适应密度峰值检测的聚类方法,其中通过非参数多元核估计来估计局部密度。然后,模型参数能够根据具有统计理论依据的方程进行计算。我们还通过最大化平均轮廓系数开发了一种自动聚类中心选择方法。通过模拟研究和对一些基准基因表达数据集的分析,证明了所提方法的优势和灵活性。该方法只需一步执行,无需任何迭代,因此速度快,在大数据分析中具有很大的应用潜力。我们开发了一个用户友好的R包ADPclust以供公众使用。

相似文献

1
Fast clustering using adaptive density peak detection.使用自适应密度峰值检测的快速聚类
Stat Methods Med Res. 2017 Dec;26(6):2800-2811. doi: 10.1177/0962280215609948. Epub 2015 Oct 16.
2
Multivariate functional data clustering using adaptive density peak detection.使用自适应密度峰值检测的多元函数数据聚类。
Stat Med. 2023 May 10;42(10):1565-1582. doi: 10.1002/sim.9687. Epub 2023 Feb 24.
3
caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.caBIG VISDA:用于基因组数据聚类分析的建模、可视化与发现
BMC Bioinformatics. 2008 Sep 18;9:383. doi: 10.1186/1471-2105-9-383.
4
LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes.LRT-CLUSTER:一种基于似然比检验的新型聚类算法以识别驱动基因。
Interdiscip Sci. 2023 Jun;15(2):217-230. doi: 10.1007/s12539-023-00554-2. Epub 2023 Feb 27.
5
SAKM: self-adaptive kernel machine. A kernel-based algorithm for online clustering.SAKM:自适应内核机器。一种基于内核的在线聚类算法。
Neural Netw. 2008 Nov;21(9):1287-301. doi: 10.1016/j.neunet.2008.03.016. Epub 2008 Jun 25.
6
An Improved Density Peak Clustering Algorithm for Multi-Density Data.一种改进的多密度数据密度峰值聚类算法。
Sensors (Basel). 2022 Nov 15;22(22):8814. doi: 10.3390/s22228814.
7
A cluster validity measure with outlier detection for support vector clustering.一种用于支持向量聚类的具有离群值检测功能的聚类有效性度量。
IEEE Trans Syst Man Cybern B Cybern. 2008 Feb;38(1):78-89. doi: 10.1109/TSMCB.2007.908862.
8
PSO-CFDP: A Particle Swarm Optimization-Based Automatic Density Peaks Clustering Method for Cancer Subtyping.PSO-CFDP:一种基于粒子群优化的癌症亚型自动密度峰值聚类方法
Hum Hered. 2019;84(1):9-20. doi: 10.1159/000501481. Epub 2019 Aug 14.
9
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters.交叉聚类:一种具有自动估计聚类数量功能的部分聚类算法。
PLoS One. 2016 Mar 25;11(3):e0152333. doi: 10.1371/journal.pone.0152333. eCollection 2016.
10
Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.

引用本文的文献

1
Robust extraction of pneumonia-associated clinical states from electronic health records.从电子健康记录中稳健地提取肺炎相关临床状态。
Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2417688121. doi: 10.1073/pnas.2417688121. Epub 2024 Oct 30.
2
The Improvement of Density Peaks Clustering Algorithm and Its Application to Point Cloud Segmentation of LiDAR.密度峰值聚类算法的改进及其在激光雷达点云分割中的应用
Sensors (Basel). 2024 Sep 1;24(17):5693. doi: 10.3390/s24175693.
3
cnnImpute: missing value recovery for single cell RNA sequencing data.
cnnImpute:单细胞 RNA 测序数据的缺失值恢复。
Sci Rep. 2024 Feb 16;14(1):3946. doi: 10.1038/s41598-024-53998-x.
4
LINEAGE: Label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis.谱系:用于谱系分析的无标记鉴定内源性信息性单细胞线粒体 RNA 突变。
Proc Natl Acad Sci U S A. 2022 Feb 1;119(5). doi: 10.1073/pnas.2119767119.
5
A Search Method for Optimal Band Combination of Hyperspectral Imagery Based on Two Layers Selection Strategy.基于两层选择策略的高光谱图像最优波段组合搜索方法
Comput Intell Neurosci. 2021 Jun 22;2021:5592323. doi: 10.1155/2021/5592323. eCollection 2021.
6
CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles.CLoNe:基于局部密度邻域的自动聚类方法在生物分子结构集合中的应用。
Bioinformatics. 2021 May 17;37(7):921-928. doi: 10.1093/bioinformatics/btaa742.
7
SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.SAME 聚类:基于混合模型集成的单细胞聚集聚类。
Nucleic Acids Res. 2020 Jan 10;48(1):86-95. doi: 10.1093/nar/gkz959.
8
Clusterdv: a simple density-based clustering method that is robust, general and automatic.Clusterdv:一种简单的基于密度的聚类方法,具有鲁棒性、通用性和自动化特点。
Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932.
9
SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.SAFE-clustering:单细胞 RNA-seq 数据的单细胞聚集(来自集成)聚类。
Bioinformatics. 2019 Apr 15;35(8):1269-1277. doi: 10.1093/bioinformatics/bty793.
10
Single-Cell RNA-Seq of Mouse Dopaminergic Neurons Informs Candidate Gene Selection for Sporadic Parkinson Disease.单细胞 RNA 测序鉴定小鼠多巴胺能神经元中的候选基因用于散发性帕金森病的研究。
Am J Hum Genet. 2018 Mar 1;102(3):427-446. doi: 10.1016/j.ajhg.2018.02.001.