Suppr超能文献

用于任意类型属性数据聚类分析的基于图的差异度测量

Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data.

作者信息

Zhang Yiqun, Cheung Yiu-Ming

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6530-6544. doi: 10.1109/TNNLS.2022.3202700. Epub 2023 Sep 1.

Abstract

Heterogeneous attribute data composed of attributes with different types of values are quite common in a variety of real-world applications. As data annotation is usually expensive, clustering has provided a promising way for processing unlabeled data, where the adopted similarity measure plays a key role in determining the clustering accuracy. However, it is a very challenging task to appropriately define the similarity between data objects with heterogeneous attributes because the values from heterogeneous attributes are generally with very different characteristics. Specifically, numerical attributes are with quantitative values, while categorical attributes are with qualitative values. Furthermore, categorical attributes can be categorized into nominal and ordinal ones according to the order information of their values. To circumvent the awkward gap among the heterogeneous attributes, this article will propose a new dissimilarity metric for cluster analysis of such data. We first study the connections among the heterogeneous attributes and build graph representations for them. Then, a metric is proposed, which computes the dissimilarities between attribute values under the guidance of the graph structures. Finally, we develop a new k -means-type clustering algorithm associated with this proposed metric. It turns out that the proposed method is competent to perform cluster analysis of datasets composed of an arbitrary combination of numerical, nominal, and ordinal attributes. Experimental results show its efficacy in comparison with its counterparts.

摘要

由具有不同类型值的属性组成的异构属性数据在各种实际应用中非常常见。由于数据标注通常成本高昂,聚类为处理未标注数据提供了一种很有前景的方法,其中所采用的相似性度量在确定聚类准确性方面起着关键作用。然而,适当地定义具有异构属性的数据对象之间的相似性是一项非常具有挑战性的任务,因为来自异构属性的值通常具有非常不同的特征。具体而言,数值属性具有定量值,而分类属性具有定性值。此外,分类属性可以根据其值的顺序信息分为标称属性和有序属性。为了规避异构属性之间的尴尬差距,本文将提出一种用于此类数据聚类分析的新的不相似性度量。我们首先研究异构属性之间的联系,并为它们构建图表示。然后,提出一种度量,该度量在图结构的指导下计算属性值之间的不相似性。最后,我们开发一种与该提出的度量相关联的新的k均值型聚类算法。结果表明,所提出的方法能够对由数值、标称和有序属性的任意组合组成的数据集进行聚类分析。实验结果表明了它与同类方法相比的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验