Suppr超能文献

大数据领域中 K-均值聚类算法的效率与效益平衡。

Balancing effort and benefit of K-means clustering algorithms in Big Data realms.

机构信息

Departamento de Ciencias Computacionales/Centro Nacional de Investigación y Desarrollo Tecnológico, Tecnológico Nacional de México, Cuernavaca, Morelos, Mexico.

Instituto de Matemáticas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico.

出版信息

PLoS One. 2018 Sep 5;13(9):e0201874. doi: 10.1371/journal.pone.0201874. eCollection 2018.

Abstract

In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.

摘要

在本文中,我们提出了一种标准,用于平衡 k-均值聚类算法在处理大数据集(n 较大)时的处理时间和解决方案质量。大多数已知的旨在提高 k-均值算法性能的策略都与初始化或分类步骤有关。相比之下,我们的标准适用于收敛步骤,即只要在任何迭代中更改其分配簇的对象数量低于给定阈值,该过程就会停止。通过对合成和真实实例进行计算机实验,我们发现接近 0.03n 的阈值会导致计算时间减少约 4/100,而解决方案的质量仅降低不到 2%。这些发现自然表明了我们的标准在大数据领域中的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d06/6124732/d04def8c4766/pone.0201874.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验