Suppr超能文献

近似大数据划分的邓恩聚类有效性指数。

Approximating Dunn's Cluster Validity Indices for Partitions of Big Data.

出版信息

IEEE Trans Cybern. 2019 May;49(5):1629-1641. doi: 10.1109/TCYB.2018.2806886. Epub 2018 Mar 5.

Abstract

Dunn's internal cluster validity index is used to assess partition quality and subsequently identify a "best" crisp partition of n objects. Computing Dunn's index (DI) for partitions of n p -dimensional feature vector data has quadratic time complexity O(pn) , so its computation is impractical for very large values of n . This note presents six methods for approximating DI. Four methods are based on Maximin sampling, which identifies a skeleton of the full partition that contains some boundary points in each cluster. Two additional methods are presented that estimate boundary points associated with unsupervised training of one class support vector machines. Numerical examples compare approximations to DI based on all six methods. Four experiments on seven real and synthetic data sets support our assertion that computing approximations to DI with an incremental, neighborhood-based Maximin skeleton is both tractable and reliably accurate.

摘要

邓恩内部聚类有效性指数用于评估分区质量,并随后确定 n 个对象的“最佳”清晰分区。计算 n 个 p 维特征向量数据的邓恩指数 (DI) 的时间复杂度为 O(pn),因此对于非常大的 n 值,其计算是不切实际的。本说明介绍了六种逼近 DI 的方法。四种方法基于最大最小抽样,该方法确定包含每个聚类中的一些边界点的完整分区的骨架。另外两种方法提出了使用无监督训练一类支持向量机来估计边界点。数值示例比较了基于所有六种方法的 DI 逼近。对七个真实和合成数据集的四项实验支持我们的断言,即使用基于增量和基于邻域的最大最小骨架计算 DI 的逼近是可行的,并且具有可靠的准确性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验