Suppr超能文献

聚类分析:通过带有非凸惩罚项的监督学习实现无监督学习

Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.

作者信息

Pan Wei, Shen Xiaotong, Liu Binghui

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455.

School of Statistics, University of Minnesota, Minneapolis, MN 55455.

出版信息

J Mach Learn Res. 2013 Jul 1;14(7):1865.

Abstract

Clustering analysis is widely used in many fields. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classification and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classification and regression, such as model selection criteria to select the number of clusters, a difficult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method's promising performance.

摘要

聚类分析在许多领域都有广泛应用。传统上,聚类被视为无监督学习,因为它缺乏类别标签或定量响应变量,而在诸如分类和回归等监督学习中则存在此类变量。在此,我们将聚类表述为带有分组追踪的惩罚回归。除了在所提出的聚类方法中新颖地使用非凸分组惩罚及其相关的独特操作特性外,这种表述的一个主要优点是它允许借鉴分类和回归中一些已确立的结果,例如用于选择聚类数目的模型选择标准,这在聚类分析中是一个难题。特别是,我们提出基于广义自由度(GDF)使用广义交叉验证(GCV)来选择聚类数目。我们使用一些简单的数值示例将我们提出的方法与一些现有方法进行比较,证明了我们方法的良好性能。

相似文献

3
Penalized unsupervised learning with outliers.
Stat Interface. 2013;6(2):211-221. doi: 10.4310/sii.2013.v6.n2.a5.
4
MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY.
Ann Appl Stat. 2021 Mar;15(1):481-508. doi: 10.1214/20-aoas1407. Epub 2021 Mar 18.
5
Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix-variate fMRI data.
Biometrics. 2023 Sep;79(3):2246-2259. doi: 10.1111/biom.13753. Epub 2022 Sep 13.
6
Elastic Net Hypergraph Learning for Image Clustering and Semi-Supervised Classification.
IEEE Trans Image Process. 2017 Jan;26(1):452-463. doi: 10.1109/TIP.2016.2621671. Epub 2016 Oct 26.
7
Penalized regression approaches to testing for quantitative trait-rare variant association.
Front Genet. 2014 May 13;5:121. doi: 10.3389/fgene.2014.00121. eCollection 2014.
8
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.
Biometrics. 2008 Mar;64(1):115-23. doi: 10.1111/j.1541-0420.2007.00843.x. Epub 2007 Jun 30.
9
Supervised convex clustering.
Biometrics. 2023 Dec;79(4):3846-3858. doi: 10.1111/biom.13860. Epub 2023 Apr 12.

引用本文的文献

1
Individualized Time-Varying Nonparametric Model With an Application in Mobile Health.
Stat Med. 2025 Feb 28;44(5):e70005. doi: 10.1002/sim.70005.
2
Application of machine learning for mass spectrometry-based multi-omics in thyroid diseases.
Front Mol Biosci. 2024 Dec 17;11:1483326. doi: 10.3389/fmolb.2024.1483326. eCollection 2024.
3
Fast Fusion Clustering via Double Random Projection.
Entropy (Basel). 2024 Apr 28;26(5):376. doi: 10.3390/e26050376.
4
Nonparametric prediction distribution from resolution-wise regression with heterogeneous data.
J Bus Econ Stat. 2023;41(4):1157-1172. doi: 10.1080/07350015.2022.2115498. Epub 2022 Oct 6.
5
Use of Machine Learning in Stroke Rehabilitation: A Narrative Review.
Brain Neurorehabil. 2022 Oct 31;15(3):e26. doi: 10.12786/bn.2022.15.e26. eCollection 2022 Nov.
6
Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix-variate fMRI data.
Biometrics. 2023 Sep;79(3):2246-2259. doi: 10.1111/biom.13753. Epub 2022 Sep 13.
7
Small area mean estimation after effect clustering.
J Appl Stat. 2019 Jul 30;47(4):602-623. doi: 10.1080/02664763.2019.1648390. eCollection 2020.
9
MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY.
Ann Appl Stat. 2021 Mar;15(1):481-508. doi: 10.1214/20-aoas1407. Epub 2021 Mar 18.
10
Clustering of Data with Missing Entries using Non-convex Fusion Penalties.
IEEE Trans Signal Process. 2019 Nov 15;67(22):5865-5880. doi: 10.1109/tsp.2019.2944758. Epub 2019 Sep 30.

本文引用的文献

1
Likelihood-based selection and sharp parameter estimation.
J Am Stat Assoc. 2012 Jan 1;107(497):223-232. doi: 10.1080/01621459.2011.645783. Epub 2012 Jun 11.
2
Grouping pursuit through a regularization solution surface.
J Am Stat Assoc. 2010 Jun 1;105(490):727-739. doi: 10.1198/jasa.2010.tm09380.
4
Evaluation and comparison of gene clustering methods in microarray analysis.
Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.
5
K-means clustering: a half-century synthesis.
Br J Math Stat Psychol. 2006 May;59(Pt 1):1-34. doi: 10.1348/000711005X48266.
6
Survey of clustering algorithms.
IEEE Trans Neural Netw. 2005 May;16(3):645-78. doi: 10.1109/TNN.2005.845141.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验