用于数据聚类的高斯混合模型与探路者算法相结合

Combined Gaussian Mixture Model and Pathfinder Algorithm for Data Clustering.

作者信息

Huang Huajuan, Liao Zepeng, Wei Xiuxi, Zhou Yongquan

机构信息

College of Artificial Intelligence, Guangxi Minzu University, Nanning 530006, China.

Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis, Nanning 530006, China.

出版信息

Entropy (Basel). 2023 Jun 16;25(6):946. doi: 10.3390/e25060946.

DOI:10.3390/e25060946

PMID:37372290

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10296861/

Abstract

Data clustering is one of the most influential branches of machine learning and data analysis, and Gaussian Mixture Models (GMMs) are frequently adopted in data clustering due to their ease of implementation. However, there are certain limitations to this approach that need to be acknowledged. GMMs need to determine the cluster numbers manually, and they may fail to extract the information within the dataset during initialization. To address these issues, a new clustering algorithm called PFA-GMM has been proposed. PFA-GMM is based on GMMs and the Pathfinder algorithm (PFA), and it aims to overcome the shortcomings of GMMs. The algorithm automatically determines the optimal number of clusters based on the dataset. Subsequently, PFA-GMM considers the clustering problem as a global optimization problem for getting trapped in local convergence during initialization. Finally, we conducted a comparative study of our proposed clustering algorithm against other well-known clustering algorithms using both synthetic and real-world datasets. The results of our experiments indicate that PFA-GMM outperformed the competing approaches.

摘要

数据聚类是机器学习和数据分析中最具影响力的分支之一，高斯混合模型（GMM）因其易于实现而经常被用于数据聚类。然而，这种方法存在一定的局限性，需要予以承认。GMM需要手动确定聚类数量，并且在初始化过程中可能无法提取数据集中的信息。为了解决这些问题，提出了一种名为PFA-GMM的新聚类算法。PFA-GMM基于GMM和探路者算法（PFA），旨在克服GMM的缺点。该算法根据数据集自动确定最佳聚类数量。随后，PFA-GMM将聚类问题视为一个全局优化问题，以避免在初始化过程中陷入局部收敛。最后，我们使用合成数据集和真实世界数据集，将我们提出的聚类算法与其他著名聚类算法进行了比较研究。我们的实验结果表明，PFA-GMM优于其他竞争方法。