• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Clusterdv:一种简单的基于密度的聚类方法,具有鲁棒性、通用性和自动化特点。

Clusterdv: a simple density-based clustering method that is robust, general and automatic.

机构信息

Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, Doca de Pedrouços, Lisboa, Portugal.

Rowland Institute at Harvard, 100 Edwin H. Land Boulevard, Cambridge, MA, USA.

出版信息

Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932.

DOI:10.1093/bioinformatics/bty932
PMID:30407500
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6581440/
Abstract

MOTIVATION

How to partition a dataset into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data vary widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions.

RESULTS

We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental datasets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.

AVAILABILITY AND IMPLEMENTATION

The clusterdv is implemented in Matlab. Its source code, together with example datasets are available on: https://github.com/jcbmarques/clusterdv.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

如何将数据集划分为一组不同的簇是一个普遍而具有挑战性的问题。数据在簇形状、簇数量、密度分布、背景噪声、异常值和重叠程度等方面差异很大,这使得很难找到一种可以广泛应用的单一算法。最近的一种方法 clusterdp 基于密度峰的搜索,可以成功地应用于聚类许多种类的数据,但它不是完全自动的,并且在一些简单的数据分布上失败。

结果

我们提出了一种替代方法 clusterdv,它估计点之间的密度凹陷,并允许在广泛的数据范围内稳健地确定簇的数量和分布,而无需任何手动参数调整。我们表明,该方法能够解决一系列已知基础结构的合成和实验数据集,并在新的行为数据中识别出一致且有意义的簇。

可用性和实现

clusterdv 是用 Matlab 实现的。它的源代码以及示例数据集可在 https://github.com/jcbmarques/clusterdv 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/e9b933c85eb6/bty932f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/901ddeddaf6d/bty932f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/3a205c5257d8/bty932f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/d8a615249164/bty932f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/5a116019fd65/bty932f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/e9b933c85eb6/bty932f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/901ddeddaf6d/bty932f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/3a205c5257d8/bty932f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/d8a615249164/bty932f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/5a116019fd65/bty932f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/e9b933c85eb6/bty932f5.jpg

相似文献

1
Clusterdv: a simple density-based clustering method that is robust, general and automatic.Clusterdv:一种简单的基于密度的聚类方法,具有鲁棒性、通用性和自动化特点。
Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932.
2
Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类,以对患者进行亚型划分。
Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.
3
Spectral clustering based on learning similarity matrix.基于学习相似性矩阵的谱聚类。
Bioinformatics. 2018 Jun 15;34(12):2069-2076. doi: 10.1093/bioinformatics/bty050.
4
flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding.flowPeaks:一种基于 K-means 和密度峰值发现的流式细胞术数据快速无监督聚类方法。
Bioinformatics. 2012 Aug 1;28(15):2052-8. doi: 10.1093/bioinformatics/bts300. Epub 2012 May 17.
5
RCDPeaks: memory-efficient density peaks clustering of long molecular dynamics.RCDPeaks:长分子动力学的内存高效密度峰聚类。
Bioinformatics. 2022 Mar 28;38(7):1863-1869. doi: 10.1093/bioinformatics/btac021.
6
Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets.吊床:一种基于隐马尔可夫模型的肽聚类算法,用于在大型数据集中识别蛋白质相互作用共有基序。
Bioinformatics. 2016 Jan 1;32(1):9-16. doi: 10.1093/bioinformatics/btv522. Epub 2015 Sep 5.
7
CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles.CLoNe:基于局部密度邻域的自动聚类方法在生物分子结构集合中的应用。
Bioinformatics. 2021 May 17;37(7):921-928. doi: 10.1093/bioinformatics/btaa742.
8
densityCut: an efficient and versatile topological approach for automatic clustering of biological data.密度切割:一种用于生物数据自动聚类的高效且通用的拓扑方法。
Bioinformatics. 2016 Sep 1;32(17):2567-76. doi: 10.1093/bioinformatics/btw227. Epub 2016 Apr 23.
9
Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.稳健稀疏相关矩阵估计在高通量基因组学数据分析中的应用
Bioinformatics. 2018 Feb 15;34(4):625-634. doi: 10.1093/bioinformatics/btx642.
10
SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation.SinNLRR:一种基于非负低秩表示的稳健子空间聚类方法,用于细胞类型检测。
Bioinformatics. 2019 Oct 1;35(19):3642-3650. doi: 10.1093/bioinformatics/btz139.

引用本文的文献

1
The adaptor protein 2 (AP2) complex modulates habituation and behavioral selection across multiple pathways and time windows.衔接蛋白2(AP2)复合物在多个通路和时间窗口调节习惯化和行为选择。
iScience. 2024 Mar 8;27(4):109455. doi: 10.1016/j.isci.2024.109455. eCollection 2024 Apr 19.
2
Dimensionality reduction reveals separate translation and rotation populations in the zebrafish hindbrain.降维分析揭示斑马鱼后脑的翻译和旋转群体分离。
Curr Biol. 2023 Sep 25;33(18):3911-3925.e6. doi: 10.1016/j.cub.2023.08.037. Epub 2023 Sep 8.

本文引用的文献

1
Structure of the Zebrafish Locomotor Repertoire Revealed with Unsupervised Behavioral Clustering.无监督行为聚类揭示斑马鱼运动模式图谱的结构。
Curr Biol. 2018 Jan 22;28(2):181-195.e5. doi: 10.1016/j.cub.2017.12.002. Epub 2018 Jan 4.
2
Clustering by fast search and merge of local density peaks for gene expression microarray data.基于局部密度峰快速搜索和合并的基因表达微阵列数据聚类。
Sci Rep. 2017 Apr 19;7:45602. doi: 10.1038/srep45602.
3
Fast clustering using adaptive density peak detection.使用自适应密度峰值检测的快速聚类
Stat Methods Med Res. 2017 Dec;26(6):2800-2811. doi: 10.1177/0962280215609948. Epub 2015 Oct 16.
4
Comparing the performance of biomedical clustering methods.比较生物医学聚类方法的性能。
Nat Methods. 2015 Nov;12(11):1033-8. doi: 10.1038/nmeth.3583. Epub 2015 Sep 21.
5
Machine learning. Clustering by fast search and find of density peaks.机器学习。基于密度峰值的快速搜索和发现的聚类。
Science. 2014 Jun 27;344(6191):1492-6. doi: 10.1126/science.1242072.
6
Complex wavelet structural similarity: a new image similarity index.复小波结构相似性:一种新的图像相似性指标。
IEEE Trans Image Process. 2009 Nov;18(11):2385-401. doi: 10.1109/TIP.2009.2025923. Epub 2009 Jun 23.
7
Sensorimotor gating in larval zebrafish.幼体斑马鱼的感觉运动门控
J Neurosci. 2007 May 2;27(18):4984-94. doi: 10.1523/JNEUROSCI.0615-07.2007.
8
Clustering by passing messages between data points.通过在数据点之间传递信息进行聚类。
Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.
9
FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.FLAME,一种用于分析DNA微阵列数据的新型模糊聚类方法。
BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.
10
Survey of clustering algorithms.聚类算法综述
IEEE Trans Neural Netw. 2005 May;16(3):645-78. doi: 10.1109/TNN.2005.845141.