• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于任意类型属性数据聚类分析的基于图的差异度测量

Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data.

作者信息

Zhang Yiqun, Cheung Yiu-Ming

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6530-6544. doi: 10.1109/TNNLS.2022.3202700. Epub 2023 Sep 1.

DOI:10.1109/TNNLS.2022.3202700
PMID:36094993
Abstract

Heterogeneous attribute data composed of attributes with different types of values are quite common in a variety of real-world applications. As data annotation is usually expensive, clustering has provided a promising way for processing unlabeled data, where the adopted similarity measure plays a key role in determining the clustering accuracy. However, it is a very challenging task to appropriately define the similarity between data objects with heterogeneous attributes because the values from heterogeneous attributes are generally with very different characteristics. Specifically, numerical attributes are with quantitative values, while categorical attributes are with qualitative values. Furthermore, categorical attributes can be categorized into nominal and ordinal ones according to the order information of their values. To circumvent the awkward gap among the heterogeneous attributes, this article will propose a new dissimilarity metric for cluster analysis of such data. We first study the connections among the heterogeneous attributes and build graph representations for them. Then, a metric is proposed, which computes the dissimilarities between attribute values under the guidance of the graph structures. Finally, we develop a new k -means-type clustering algorithm associated with this proposed metric. It turns out that the proposed method is competent to perform cluster analysis of datasets composed of an arbitrary combination of numerical, nominal, and ordinal attributes. Experimental results show its efficacy in comparison with its counterparts.

摘要

由具有不同类型值的属性组成的异构属性数据在各种实际应用中非常常见。由于数据标注通常成本高昂,聚类为处理未标注数据提供了一种很有前景的方法,其中所采用的相似性度量在确定聚类准确性方面起着关键作用。然而,适当地定义具有异构属性的数据对象之间的相似性是一项非常具有挑战性的任务,因为来自异构属性的值通常具有非常不同的特征。具体而言,数值属性具有定量值,而分类属性具有定性值。此外,分类属性可以根据其值的顺序信息分为标称属性和有序属性。为了规避异构属性之间的尴尬差距,本文将提出一种用于此类数据聚类分析的新的不相似性度量。我们首先研究异构属性之间的联系,并为它们构建图表示。然后,提出一种度量,该度量在图结构的指导下计算属性值之间的不相似性。最后,我们开发一种与该提出的度量相关联的新的k均值型聚类算法。结果表明,所提出的方法能够对由数值、标称和有序属性的任意组合组成的数据集进行聚类分析。实验结果表明了它与同类方法相比的有效性。

相似文献

1
Graph-Based Dissimilarity Measurement for Cluster Analysis of Any-Type-Attributed Data.用于任意类型属性数据聚类分析的基于图的差异度测量
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6530-6544. doi: 10.1109/TNNLS.2022.3202700. Epub 2023 Sep 1.
2
Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes.具有名义属性和有序属性的类别数据聚类的可学习属性内距离加权。
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3560-3576. doi: 10.1109/TPAMI.2021.3056510. Epub 2022 Jun 3.
3
A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering.利用异类属性间关系的新距离度量在序类别-名义属性数据聚类中的应用。
IEEE Trans Cybern. 2022 Feb;52(2):758-771. doi: 10.1109/TCYB.2020.2983073. Epub 2022 Feb 16.
4
A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering.一种用于有序和标称属性数据聚类的基于统一熵的距离度量。
IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):39-52. doi: 10.1109/TNNLS.2019.2899381. Epub 2019 Mar 19.
5
Coupled attribute similarity learning on categorical data.基于类别数据的耦合属性相似性学习。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):781-97. doi: 10.1109/TNNLS.2014.2325872.
6
Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.具有未知聚类数的分类数据和数值数据的子空间聚类
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3308-3325. doi: 10.1109/TNNLS.2017.2728138. Epub 2017 Aug 3.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Methods for a similarity measure for clinical attributes based on survival data analysis.基于生存数据分析的临床属性相似性度量方法。
BMC Med Inform Decis Mak. 2019 Oct 21;19(1):195. doi: 10.1186/s12911-019-0917-6.
9
Locally Weighted Fusion of Structural and Attribute Information in Graph Clustering.图聚类中结构和属性信息的局部加权融合。
IEEE Trans Cybern. 2019 Jan;49(1):247-260. doi: 10.1109/TCYB.2017.2771496. Epub 2017 Nov 22.
10
Identifying cell types from single-cell data based on similarities and dissimilarities between cells.基于细胞之间的相似性和差异性从单细胞数据中识别细胞类型。
BMC Bioinformatics. 2021 May 18;22(Suppl 3):255. doi: 10.1186/s12859-020-03873-z.