Suppr超能文献

Clusterdv:一种简单的基于密度的聚类方法,具有鲁棒性、通用性和自动化特点。

Clusterdv: a simple density-based clustering method that is robust, general and automatic.

机构信息

Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, Doca de Pedrouços, Lisboa, Portugal.

Rowland Institute at Harvard, 100 Edwin H. Land Boulevard, Cambridge, MA, USA.

出版信息

Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932.

Abstract

MOTIVATION

How to partition a dataset into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data vary widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions.

RESULTS

We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental datasets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.

AVAILABILITY AND IMPLEMENTATION

The clusterdv is implemented in Matlab. Its source code, together with example datasets are available on: https://github.com/jcbmarques/clusterdv.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

如何将数据集划分为一组不同的簇是一个普遍而具有挑战性的问题。数据在簇形状、簇数量、密度分布、背景噪声、异常值和重叠程度等方面差异很大,这使得很难找到一种可以广泛应用的单一算法。最近的一种方法 clusterdp 基于密度峰的搜索,可以成功地应用于聚类许多种类的数据,但它不是完全自动的,并且在一些简单的数据分布上失败。

结果

我们提出了一种替代方法 clusterdv,它估计点之间的密度凹陷,并允许在广泛的数据范围内稳健地确定簇的数量和分布,而无需任何手动参数调整。我们表明,该方法能够解决一系列已知基础结构的合成和实验数据集,并在新的行为数据中识别出一致且有意义的簇。

可用性和实现

clusterdv 是用 Matlab 实现的。它的源代码以及示例数据集可在 https://github.com/jcbmarques/clusterdv 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebb7/6581440/901ddeddaf6d/bty932f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验