Suppr超能文献

具有名义属性和有序属性的类别数据聚类的可学习属性内距离加权。

Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3560-3576. doi: 10.1109/TPAMI.2021.3056510. Epub 2022 Jun 3.

Abstract

The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e., nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.

摘要

类别数据聚类的成功通常很大程度上依赖于度量两个对象之间相似度的距离度量。然而,大多数现有的聚类方法在计算相似度时,对类别数据的两种子类,即名义属性和有序属性,不加区分地采用相同的方法,而不考虑有序值的相对顺序信息。此外,名义属性和有序属性之间可能存在相互依赖关系,这值得探索以指示相似度。因此,本文将从图的角度研究名义属性和有序属性值的内在差异和联系。相应地,我们提出了一种新的距离度量方法,以统一地度量名义属性和有序属性的属性内距离,同时保留有序值之间的顺序关系。随后,我们提出了一种新的聚类算法,将属性内距离权重的学习和数据对象的划分纳入单一的学习范例中,而不是两个单独的步骤,从而避免了次优解。实验表明,与现有方法相比,所提出的算法具有更好的效果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验