• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种具有基于分位数似然估计器的非参数聚类算法。

A nonparametric clustering algorithm with a quantile-based likelihood estimator.

作者信息

Hino Hideitsu, Murata Noboru

机构信息

Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan, 305-8573

出版信息

Neural Comput. 2014 Sep;26(9):2074-101. doi: 10.1162/NECO_a_00628. Epub 2014 Jun 12.

DOI:10.1162/NECO_a_00628
PMID:24922504
Abstract

Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

摘要

聚类是无监督学习的一种代表方法,也是探索性数据分析中的重要方法之一。就其本质而言,聚类无需对数据分布做出强假设。信息论聚类是一类通过优化诸如熵和互信息等信息论量的聚类方法。这些量可以用非参数方式进行估计,并且信息论聚类算法能够捕捉各种内在的数据结构。使用对每个数据带有采样权重的数据集来估计信息论量也是可行的。假设数据集是从某个聚类中采样得到的,并根据聚类分配不同的采样权重,进而估计聚类条件下的信息论量。在本信函中,基于加权数据集对数似然的非参数估计器,提出了一种简单的迭代聚类算法。该聚类算法也是从具有最大熵正则化的条件熵最小化原理推导而来的。所提出的算法不包含调优参数。实验表明,该算法与传统非参数聚类方法相当或更优。

相似文献

1
A nonparametric clustering algorithm with a quantile-based likelihood estimator.一种具有基于分位数似然估计器的非参数聚类算法。
Neural Comput. 2014 Sep;26(9):2074-101. doi: 10.1162/NECO_a_00628. Epub 2014 Jun 12.
2
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
3
Information-theoretic semi-supervised metric learning via entropy regularization.通过熵正则化的信息论半监督度量学习
Neural Comput. 2014 Aug;26(8):1717-62. doi: 10.1162/NECO_a_00614. Epub 2014 May 30.
4
A robust information clustering algorithm.一种强大的信息聚类算法。
Neural Comput. 2005 Dec;17(12):2672-98. doi: 10.1162/089976605774320548.
5
Minimax mutual information approach for independent component analysis.用于独立成分分析的极小极大互信息方法。
Neural Comput. 2004 Jun;16(6):1235-52. doi: 10.1162/089976604773717595.
6
Visual MRI: merging information visualization and non-parametric clustering techniques for MRI dataset analysis.可视化磁共振成像:融合信息可视化与非参数聚类技术用于磁共振成像数据集分析。
Artif Intell Med. 2008 Nov;44(3):183-99. doi: 10.1016/j.artmed.2008.06.006. Epub 2008 Sep 4.
7
Information estimators for weighted observations.加权观测的信息估计量。
Neural Netw. 2013 Oct;46:260-75. doi: 10.1016/j.neunet.2013.06.005. Epub 2013 Jun 24.
8
Fast iterative gene clustering based on information theoretic criteria for selecting the cluster structure.基于信息论准则选择聚类结构的快速迭代基因聚类
J Comput Biol. 2004;11(4):660-82. doi: 10.1089/1066527041887285.
9
A conditional entropy minimization criterion for dimensionality reduction and multiple kernel learning.一种用于降维和多核学习的条件熵最小化准则。
Neural Comput. 2010 Nov;22(11):2887-923. doi: 10.1162/NECO_a_00027.
10
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。
Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.