• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有未知聚类数的分类数据和数值数据的子空间聚类

Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.

作者信息

Jia Hong, Cheung Yiu-Ming

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3308-3325. doi: 10.1109/TNNLS.2017.2728138. Epub 2017 Aug 3.

DOI:10.1109/TNNLS.2017.2728138
PMID:28792907
Abstract

In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.

摘要

在聚类分析中,数据属性对不同聚类的检测可能有不同的贡献。为了解决这个问题,人们开发了子空间聚类技术,其目的是基于属性子集而不是整个数据空间将数据对象分组为聚类。然而,现有的大多数子空间聚类方法仅适用于数值数据或分类数据,不能同时适用于两者。因此,本文研究具有数值和分类属性的数据(简称为混合数据)的软子空间聚类。具体而言,提出了一种基于对象-聚类相似性定义的属性加权聚类模型。相应地,提出了一种针对数值和分类属性的统一加权方案,该方案通过同时考虑类间差异和类内相似性来量化属性对聚类的贡献。此外,在所提出的软子空间聚类算法中进一步引入了竞争惩罚竞争学习机制,以便在单一学习范式中同时学习子空间聚类结构以及最合适的聚类数量。另外,还提出了一种面向初始化的方法,该方法可以有效提高均值型聚类方法在数值、分类和混合数据上的稳定性和准确性。在不同基准数据集上的实验结果表明了所提方法的有效性。

相似文献

1
Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.具有未知聚类数的分类数据和数值数据的子空间聚类
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3308-3325. doi: 10.1109/TNNLS.2017.2728138. Epub 2017 Aug 3.
2
Cluster Validation Method for Determining the Number of Clusters in Categorical Sequences.类别序列中确定聚类数量的聚类验证方法。
IEEE Trans Neural Netw Learn Syst. 2017 Dec;28(12):2936-2948. doi: 10.1109/TNNLS.2016.2608354. Epub 2016 Sep 27.
3
Extending Data Reliability Measure to a Filter Approach for Soft Subspace Clustering.将数据可靠性度量扩展为软子空间聚类的滤波方法。
IEEE Trans Syst Man Cybern B Cybern. 2011 Dec;41(6):1705-14. doi: 10.1109/TSMCB.2011.2160341. Epub 2011 Jul 28.
4
Subspace Weighting Co-Clustering of Gene Expression Data.基于基因表达数据的子空间加权协同聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):352-364. doi: 10.1109/TCBB.2017.2705686. Epub 2017 May 18.
5
Coupled attribute similarity learning on categorical data.基于类别数据的耦合属性相似性学习。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):781-97. doi: 10.1109/TNNLS.2014.2325872.
6
Simultaneous Subspace Clustering and Cluster Number Estimating Based on Triplet Relationship.基于三元组关系的同步子空间聚类与聚类数估计
IEEE Trans Image Process. 2019 Aug;28(8):3973-3985. doi: 10.1109/TIP.2019.2903294. Epub 2019 Mar 6.
7
Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes.具有名义属性和有序属性的类别数据聚类的可学习属性内距离加权。
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3560-3576. doi: 10.1109/TPAMI.2021.3056510. Epub 2022 Jun 3.
8
An Empirical Analysis of Rough Set Categorical Clustering Techniques.粗糙集分类聚类技术的实证分析
PLoS One. 2017 Jan 9;12(1):e0164803. doi: 10.1371/journal.pone.0164803. eCollection 2017.
9
Multiview Subspace Clustering With Grouping Effect.具有分组效应的多视图子空间聚类
IEEE Trans Cybern. 2022 Aug;52(8):7655-7668. doi: 10.1109/TCYB.2020.3035043. Epub 2022 Jul 19.
10
Human Motion Segmentation via Robust Kernel Sparse Subspace Clustering.基于鲁棒核稀疏子空间聚类的人体运动分割。
IEEE Trans Image Process. 2018;27(1):135-150. doi: 10.1109/TIP.2017.2738562.

引用本文的文献

1
A machine learning approach for early prediction of gestational diabetes mellitus using elemental contents in fingernails.一种利用指甲元素含量进行妊娠期糖尿病早期预测的机器学习方法。
Sci Rep. 2023 Mar 14;13(1):4184. doi: 10.1038/s41598-023-31270-y.