Hammer Joseph L, Devanny Alexander J, Kaufman Laura J
Department of Chemistry, Columbia University, New York, NY, 10027, USA.
Commun Biol. 2025 Jun 10;8(1):902. doi: 10.1038/s42003-025-08332-0.
Density-based clustering is used in many contexts including in single molecule localization microscopy (SMLM), where it is commonly used to elucidate the nanoscale organization of molecules. However, little guidance is available for evaluating clustering performance, which is often dependent on user-input parameters. Here, we develop an efficient implementation of density-based cluster validation (DBCV) that can quantitatively evaluate clustering in SMLM-sized datasets. We demonstrate that maximizing DBCV scores accurately identifies clusters in noisy, simulated data. Coupling DBCV with Bayesian optimization, we outline a method, DBOpt, that selects input parameters in an unbiased manner for density-based clustering algorithms. We demonstrate that optimal parameters can be selected for popular algorithms (DBSCAN, HDBSCAN, OPTICS) with minimal user input. Finally, we show that DBOpt reports accurate feature sizes in 2D and 3D experimental data. In sum, DBOpt will improve the integrity, reproducibility, and quality of cluster analyses for SMLM and beyond.
基于密度的聚类在许多场景中都有应用,包括单分子定位显微镜(SMLM),在该技术中它通常用于阐明分子的纳米级组织。然而,对于评估聚类性能的指导却很少,而聚类性能往往取决于用户输入的参数。在此,我们开发了一种基于密度的聚类验证(DBCV)的高效实现方法,它可以定量评估SMLM规模数据集的聚类情况。我们证明,最大化DBCV分数能够准确识别有噪声的模拟数据中的聚类。将DBCV与贝叶斯优化相结合,我们概述了一种名为DBOpt的方法,该方法以无偏的方式为基于密度的聚类算法选择输入参数。我们证明,只需最少的用户输入,就能为流行算法(DBSCAN、HDBSCAN、OPTICS)选择最优参数。最后,我们表明DBOpt能在二维和三维实验数据中报告准确的特征尺寸。总之,DBOpt将提高SMLM及其他领域聚类分析的完整性、可重复性和质量。