• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用异类属性间关系的新距离度量在序类别-名义属性数据聚类中的应用。

A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering.

出版信息

IEEE Trans Cybern. 2022 Feb;52(2):758-771. doi: 10.1109/TCYB.2020.2983073. Epub 2022 Feb 16.

DOI:10.1109/TCYB.2020.2983073
PMID:32340972
Abstract

Ordinal attribute has all the common characteristics of a nominal one but it differs from the nominal one by having naturally ordered possible values (also called categories interchangeably). In clustering analysis tasks, categorical data composed of both ordinal and nominal attributes (also called mixed-categorical data interchangeably) are common. Under this circumstance, existing distance and similarity measures suffer from at least one of the following two drawbacks: 1) directly treat ordinal attributes as nominal ones, and thus ignore the order information from them and 2) suppose all the attributes are independent of each other, measure the distance between two categories from a target attribute without considering the valuable information provided by the other attributes that correlate with the target one. These two drawbacks may twist the natural distances of attributes and further lead to unsatisfactory clustering results. This article, therefore, presents an entropy-based distance metric that quantifies the distance between categories by exploiting the information provided by different attributes that correlate with the target one. It also preserves the order relationship among ordinal categories during the distance measurement. Since attributes are usually correlated in different degrees, we also define the interdependence between different types of attributes to weight their contributions in forming distances. The proposed metric overcomes the two above-mentioned drawbacks for mixed-categorical data clustering. More important, it conceptually unifies the distances of ordinal and nominal attributes to avoid information loss during clustering. Moreover, it is parameter free, and will not bring extra computational cost compared to the existing state-of-the-art counterparts. Extensive experiments show the superiority of the proposed distance metric.

摘要

有序属性具有与名义属性相同的所有常见特征,但它与名义属性不同,因为它具有自然有序的可能值(也可以互换地称为类别)。在聚类分析任务中,由有序和名义属性组成的分类数据(也可以互换地称为混合分类数据)很常见。在这种情况下,现有的距离和相似性度量至少存在以下两个缺点之一:1)直接将有序属性视为名义属性,从而忽略了它们的顺序信息,2)假设所有属性彼此独立,从目标属性测量两个类别的距离,而不考虑与目标属性相关的其他属性提供的有价值信息。这两个缺点可能会扭曲属性的自然距离,并进一步导致聚类结果不理想。因此,本文提出了一种基于熵的距离度量,该度量通过利用与目标属性相关的不同属性提供的信息来量化类别之间的距离。它还在距离测量过程中保留有序类别之间的顺序关系。由于属性通常以不同的程度相关,我们还定义了不同类型属性之间的相互依赖性,以权衡它们在形成距离中的贡献。所提出的度量方法克服了混合分类数据聚类的上述两个缺点。更重要的是,它从概念上统一了有序属性和名义属性的距离,避免了聚类过程中的信息丢失。此外,它是无参数的,与现有最先进的方法相比不会带来额外的计算成本。广泛的实验表明了所提出的距离度量的优越性。

相似文献

1
A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering.利用异类属性间关系的新距离度量在序类别-名义属性数据聚类中的应用。
IEEE Trans Cybern. 2022 Feb;52(2):758-771. doi: 10.1109/TCYB.2020.2983073. Epub 2022 Feb 16.
2
A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering.一种用于有序和标称属性数据聚类的基于统一熵的距离度量。
IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):39-52. doi: 10.1109/TNNLS.2019.2899381. Epub 2019 Mar 19.
3
Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes.具有名义属性和有序属性的类别数据聚类的可学习属性内距离加权。
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3560-3576. doi: 10.1109/TPAMI.2021.3056510. Epub 2022 Jun 3.
4
Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data.用于任意类型属性数据聚类分析的基于图的差异度测量
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6530-6544. doi: 10.1109/TNNLS.2022.3202700. Epub 2023 Sep 1.
5
Coupled attribute similarity learning on categorical data.基于类别数据的耦合属性相似性学习。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):781-97. doi: 10.1109/TNNLS.2014.2325872.
6
A New Distance Metric for Unsupervised Learning of Categorical Data.一种用于无监督学习的类别数据的新距离度量。
IEEE Trans Neural Netw Learn Syst. 2016 May;27(5):1065-79. doi: 10.1109/TNNLS.2015.2436432. Epub 2015 Jun 9.
7
Adaptive metric learning vector quantization for ordinal classification.有序分类的自适应度量学习矢量量化。
Neural Comput. 2012 Nov;24(11):2825-51. doi: 10.1162/NECO_a_00358. Epub 2012 Aug 24.
8
Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.具有未知聚类数的分类数据和数值数据的子空间聚类
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3308-3325. doi: 10.1109/TNNLS.2017.2728138. Epub 2017 Aug 3.
9
Employing heat maps to mine associations in structured routine care data.利用热图挖掘结构化常规护理数据中的关联。
Artif Intell Med. 2014 Feb;60(2):79-88. doi: 10.1016/j.artmed.2013.12.003. Epub 2013 Dec 15.
10
Attribute clustering for grouping, selection, and classification of gene expression data.用于基因表达数据分组、选择和分类的属性聚类
IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):83-101. doi: 10.1109/TCBB.2005.17.

引用本文的文献

1
A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19.一种利用时间序列数据改进亚表型分析并应用于新冠肺炎危重症患者的新方法。
Artif Intell Med. 2024 Feb;148:102750. doi: 10.1016/j.artmed.2023.102750. Epub 2023 Dec 20.