Suppr超能文献

一种基于惩罚回归聚类的新算法与理论

A New Algorithm and Theory for Penalized Regression-based Clustering.

作者信息

Wu Chong, Kwon Sunghoon, Shen Xiaotong, Pan Wei

机构信息

Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA.

Department of Applied Statistics, Konkuk University, Seoul, South Korea School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

J Mach Learn Res. 2016;17.

Abstract

Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit, as demonstrated in Pan et al. (2013). In this paper, we develop a more efficient algorithm for scalable computation and a new theory of clustering consistency for the method. This algorithm, called DC-ADMM, combines difference of convex (DC) programming with the alternating direction method of multipliers (ADMM). This algorithm is shown to be more computationally efficient than the quadratic penalty based algorithm of Pan et al. (2013) because of the former's closed-form updating formulas. Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a finite-sample mis-clustering error bound for penalized regression based clustering with the constrained regularization in a general setting. On this ground, we provide conditions for clustering consistency of the penalized clustering method. As an end product, we put R package implementing PRclust with various loss and grouping penalty functions available on GitHub and CRAN.

摘要

聚类本质上是无监督且探索性的。然而,正如Pan等人(2013年)所展示的,它可以通过带分组追踪的惩罚回归来执行。在本文中,我们为可扩展计算开发了一种更高效的算法,并为该方法建立了一种新的聚类一致性理论。这种算法称为DC - ADMM,它将凸差(DC)规划与乘子交替方向法(ADMM)相结合。由于前者具有闭式更新公式,该算法在计算上比Pan等人(2013年)基于二次惩罚的算法更高效。在数值上,我们将DC - ADMM算法与二次惩罚算法进行比较,以证明其效用和可扩展性。在理论上,我们在一般情况下为基于惩罚回归的聚类建立了一个有限样本误聚类误差界,其中带有约束正则化。在此基础上,我们为惩罚聚类方法的聚类一致性提供了条件。作为最终成果,我们在GitHub和CRAN上发布了实现PRclust的R包,其中包含各种损失和分组惩罚函数可用。

相似文献

2
Algorithms for Fitting the Constrained Lasso.用于拟合约束套索的算法
J Comput Graph Stat. 2018;27(4):861-871. doi: 10.1080/10618600.2018.1473777. Epub 2018 Aug 7.
4
Splitting Methods for Convex Clustering.凸聚类的分裂方法
J Comput Graph Stat. 2015;24(4):994-1013. doi: 10.1080/10618600.2014.948181. Epub 2015 Dec 10.
6
A Path Algorithm for Constrained Estimation.一种用于约束估计的路径算法。
J Comput Graph Stat. 2013;22(2):261-283. doi: 10.1080/10618600.2012.681248.

本文引用的文献

1
Splitting Methods for Convex Clustering.凸聚类的分裂方法
J Comput Graph Stat. 2015;24(4):994-1013. doi: 10.1080/10618600.2014.948181. Epub 2015 Dec 10.
4
Likelihood-based selection and sharp parameter estimation.基于似然性的选择与精确参数估计。
J Am Stat Assoc. 2012 Jan 1;107(497):223-232. doi: 10.1080/01621459.2011.645783. Epub 2012 Jun 11.
5
K-means clustering: a half-century synthesis.K均值聚类:半个世纪的综述
Br J Math Stat Psychol. 2006 May;59(Pt 1):1-34. doi: 10.1348/000711005X48266.
6
Survey of clustering algorithms.聚类算法综述
IEEE Trans Neural Netw. 2005 May;16(3):645-78. doi: 10.1109/TNN.2005.845141.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验