Department of Mathematics and Applications, University of Napoli Federico II, Naples, Italy.
Math Biosci. 2013 Sep;245(1):76-85. doi: 10.1016/j.mbs.2013.07.011. Epub 2013 Jul 26.
Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data.
聚类分析旨在发现给定实体集合的子集(聚类),这些子集是同质的和/或很好地分离的。从 20 世纪 90 年代开始,聚类分析已经应用于许多领域,并产生了许多应用。它已成为最令人兴奋的跨学科领域之一,受益于不同科学研究社区获得的概念和理论成果,包括遗传学、生物学、生物化学、数学和计算机科学。过去十年带来了几种新的算法,这些算法能够解决更大规模和真实世界的实例。我们将概述主要的聚类类型和同质性或分离的标准。讨论了解决方案技术,特别强调组合优化的角度,旨在为聚类从业者的广大社区提供概念上的见解和文献参考。还描述了一种新的有偏随机键遗传算法,并将其与最近提出的几种用于聚类生物数据的高效混合 GRASP 算法进行了比较。