Suppr超能文献

使用特征组合的证据积累聚类

Evidence accumulation clustering using combinations of features.

作者信息

Wong William, Tsuchiya Naotsugu

机构信息

School of Psychological Sciences and Turner Institute for Brain and Mental Health, Monash University.

Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology (NICT), Suita, Osaka, Japan.

出版信息

MethodsX. 2020 May 14;7:100916. doi: 10.1016/j.mex.2020.100916. eCollection 2020.

Abstract

Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data.•The clustering ensemble is made by clustering on subset combinations of features from the data•The recombined data may be prewhitened•Evidence accumulation can be improved by weighting the evidence with a goodness-of-clustering measure.

摘要

证据积累聚类(EAC)是一种集成聚类算法,它可以对任意形状和数量的聚类进行数据聚类。在此,我们提出了一种EAC的变体,旨在更好地对具有大量特征的数据进行聚类,其中许多特征可能是无信息的。我们的新方法基于现有的EAC算法,通过每次基于比原始数据集更少的特征组合进行聚类来填充聚类集成。我们的方法还要求对重新组合的数据进行白化处理,并通过估计其信息量来加权每个单独聚类的影响。我们提供了该算法在Matlab中的示例实现代码,并与合成数据的普通证据积累聚类相比,展示了其有效性。

•聚类集成是通过对数据特征的子集组合进行聚类来构建的

•重新组合的数据可以进行白化处理

•通过用聚类质量度量对证据进行加权,可以改进证据积累。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验