使用 sharp 自动校准基于共识权重距离的聚类方法。

Automated calibration of consensus weighted distance-based clustering approaches using sharp.

机构信息

Department of Epidemiology and Biostatistics, Imperial College London, Norfolk place, London W2 1PG, United Kingdom.

Department of Mathematics, Imperial College London, London SW7 2RH, United Kingdom.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad635.

DOI:10.1093/bioinformatics/btad635

PMID:37847776

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10627366/

Abstract

MOTIVATION

In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms.

RESULTS

We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularized approaches. We propose a procedure for the calibration of the number of clusters (and regularization parameter) by maximizing the sharp score, a novel stability score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) approaches calibrated by maximizing the sharp score compared to existing calibration scores and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes.

AVAILABILITY AND IMPLEMENTATION

The R package sharp (version ≥1.4.3) is available on CRAN at https://CRAN.R-project.org/package=sharp.

摘要

动机

在共识聚类中，聚类算法与抽样程序结合使用以检测稳定的聚类。先前基于模拟和真实数据的研究表明，共识聚类优于原生算法。

结果

我们在这里扩展共识聚类，允许在计算成对距离时使用现有正则化方法对属性进行加权。我们提出了一种通过最大化尖锐分数来校准聚类数量（和正则化参数）的程序，尖锐分数是直接从共识聚类输出计算得出的一种新的稳定性分数，使其在计算上极具竞争力。我们的模拟研究表明，与现有的校准分数相比，（i）通过最大化尖锐分数进行校准的方法具有更好的聚类性能，以及（ii）在存在对聚类定义没有贡献的特征的情况下，加权方法比非加权方法具有更好的聚类性能。在测量肺组织中基因表达的真实数据上的应用揭示了与不同肺癌亚型相对应的清晰聚类。

可用性和实现

R 包 sharp（版本≥1.4.3）可在 https://CRAN.R-project.org/package=sharp 上从 CRAN 获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用 sharp 自动校准基于共识权重距离的聚类方法。

Automated calibration of consensus weighted distance-based clustering approaches using sharp.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

使用 sharp 自动校准基于共识权重距离的聚类方法。

Automated calibration of consensus weighted distance-based clustering approaches using sharp.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献