• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于有序和标称属性数据聚类的基于统一熵的距离度量。

A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering.

作者信息

Zhang Yiqun, Cheung Yiu-Ming, Tan Kay Chen

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):39-52. doi: 10.1109/TNNLS.2019.2899381. Epub 2019 Mar 19.

DOI:10.1109/TNNLS.2019.2899381
PMID:30908240
Abstract

Ordinal data are common in many data mining and machine learning tasks. Compared to nominal data, the possible values (also called categories interchangeably) of an ordinal attribute are naturally ordered. Nevertheless, since the data values are not quantitative, the distance between two categories of an ordinal attribute is generally not well defined, which surely has a serious impact on the result of the quantitative analysis if an inappropriate distance metric is utilized. From the practical perspective, ordinal-and-nominal-attribute categorical data, i.e., categorical data associated with a mixture of nominal and ordinal attributes, is common, but the distance metric for such data has yet to be well explored in the literature. In this paper, within the framework of clustering analysis, we therefore first propose an entropy-based distance metric for ordinal attributes, which exploits the underlying order information among categories of an ordinal attribute for the distance measurement. Then, we generalize this distance metric and propose a unified one accordingly, which is applicable to ordinal-and-nominal-attribute categorical data. Compared with the existing metrics proposed for categorical data, the proposed metric is simple to use and nonparametric. More importantly, it reasonably exploits the underlying order information of ordinal attributes and statistical information of nominal attributes for distance measurement. Extensive experiments show that the proposed metric outperforms the existing counterparts on both the real and benchmark data sets.

摘要

序数数据在许多数据挖掘和机器学习任务中很常见。与标称数据相比,序数属性的可能值(也可互换地称为类别)是自然有序的。然而,由于数据值不是定量的,序数属性的两个类别之间的距离通常没有很好地定义,如果使用不适当的距离度量,这肯定会对定量分析的结果产生严重影响。从实际角度来看,序数和标称属性分类数据,即与标称和序数属性混合相关的分类数据很常见,但此类数据的距离度量在文献中尚未得到充分探索。因此,在本文中,在聚类分析的框架内,我们首先为序数属性提出了一种基于熵的距离度量,该度量利用序数属性类别之间的潜在顺序信息进行距离测量。然后,我们对这个距离度量进行了推广,并相应地提出了一个统一的距离度量,它适用于序数和标称属性分类数据。与现有的针对分类数据提出的度量相比,所提出的度量使用简单且非参数化。更重要的是,它合理地利用了序数属性的潜在顺序信息和标称属性的统计信息进行距离测量。大量实验表明,所提出的度量在真实数据集和基准数据集上均优于现有同类度量。

相似文献

1
A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering.一种用于有序和标称属性数据聚类的基于统一熵的距离度量。
IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):39-52. doi: 10.1109/TNNLS.2019.2899381. Epub 2019 Mar 19.
2
A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering.利用异类属性间关系的新距离度量在序类别-名义属性数据聚类中的应用。
IEEE Trans Cybern. 2022 Feb;52(2):758-771. doi: 10.1109/TCYB.2020.2983073. Epub 2022 Feb 16.
3
Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes.具有名义属性和有序属性的类别数据聚类的可学习属性内距离加权。
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3560-3576. doi: 10.1109/TPAMI.2021.3056510. Epub 2022 Jun 3.
4
Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data.用于任意类型属性数据聚类分析的基于图的差异度测量
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6530-6544. doi: 10.1109/TNNLS.2022.3202700. Epub 2023 Sep 1.
5
A New Distance Metric for Unsupervised Learning of Categorical Data.一种用于无监督学习的类别数据的新距离度量。
IEEE Trans Neural Netw Learn Syst. 2016 May;27(5):1065-79. doi: 10.1109/TNNLS.2015.2436432. Epub 2015 Jun 9.
6
Coupled attribute similarity learning on categorical data.基于类别数据的耦合属性相似性学习。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):781-97. doi: 10.1109/TNNLS.2014.2325872.
7
Adaptive metric learning vector quantization for ordinal classification.有序分类的自适应度量学习矢量量化。
Neural Comput. 2012 Nov;24(11):2825-51. doi: 10.1162/NECO_a_00358. Epub 2012 Aug 24.
8
Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.具有未知聚类数的分类数据和数值数据的子空间聚类
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3308-3325. doi: 10.1109/TNNLS.2017.2728138. Epub 2017 Aug 3.
9
Comparison of ordinal and nominal classification trees to predict ordinal expert-based occupational exposure estimates in a case-control study.在一项病例对照研究中,比较有序分类树和名义分类树以预测基于专家的有序职业暴露估计值。
Ann Occup Hyg. 2015 Apr;59(3):324-35. doi: 10.1093/annhyg/meu098. Epub 2014 Nov 27.
10
Employing heat maps to mine associations in structured routine care data.利用热图挖掘结构化常规护理数据中的关联。
Artif Intell Med. 2014 Feb;60(2):79-88. doi: 10.1016/j.artmed.2013.12.003. Epub 2013 Dec 15.