Xu Rui, Wunsch Donald
Department of Electrical and Computer Engineering, University of Missouri-Rolla, Rolla, MO 65409, USA.
IEEE Trans Neural Netw. 2005 May;16(3):645-78. doi: 10.1109/TNN.2005.845141.
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
数据分析对于理解各种现象起着不可或缺的作用。聚类分析是一种在几乎没有或完全没有先验知识的情况下进行的原始探索,它涵盖了多个不同领域开展的研究。一方面,这种多样性为我们提供了许多工具;另一方面,大量的选择也会导致困惑。我们考察了统计学、计算机科学和机器学习中出现的数据集的聚类算法,并说明了它们在一些基准数据集中的应用、旅行商问题以及生物信息学(一个吸引大量研究精力的新领域)中的应用。还讨论了几个紧密相关的主题、相似度度量和聚类验证。