Zhu Ailin, Hua Zexi, Shi Yu, Tang Yongchuan, Miao Lingwei
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China.
School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China.
Entropy (Basel). 2021 Nov 21;23(11):1550. doi: 10.3390/e23111550.
The main influencing factors of the clustering effect of the k-means algorithm are the selection of the initial clustering center and the distance measurement between the sample points. The traditional k-mean algorithm uses Euclidean distance to measure the distance between sample points, thus it suffers from low differentiation of attributes between sample points and is prone to local optimal solutions. For this feature, this paper proposes an improved k-means algorithm based on evidence distance. Firstly, the attribute values of sample points are modelled as the basic probability assignment (BPA) of sample points. Then, the traditional Euclidean distance is replaced by the evidence distance for measuring the distance between sample points, and finally k-means clustering is carried out using UCI data. Experimental comparisons are made with the traditional k-means algorithm, the k-means algorithm based on the aggregation distance parameter, and the Gaussian mixture model. The experimental results show that the improved k-means algorithm based on evidence distance proposed in this paper has a better clustering effect and the convergence of the algorithm is also better.
k均值算法聚类效果的主要影响因素是初始聚类中心的选择以及样本点之间的距离度量。传统的k均值算法使用欧几里得距离来度量样本点之间的距离,因此存在样本点之间属性区分度低且容易陷入局部最优解的问题。针对这一特点,本文提出了一种基于证据距离的改进k均值算法。首先,将样本点的属性值建模为样本点的基本概率分配(BPA)。然后,用证据距离取代传统的欧几里得距离来度量样本点之间的距离,最后使用UCI数据进行k均值聚类。与传统k均值算法、基于聚合距离参数的k均值算法以及高斯混合模型进行了实验比较。实验结果表明,本文提出的基于证据距离的改进k均值算法具有更好的聚类效果,且算法的收敛性也更好。