College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
Engineering Lab of Intelligence Business & Internet of Things, Xinxiang 453007, China.
Sensors (Basel). 2022 Aug 17;22(16):6163. doi: 10.3390/s22166163.
The rapid growth of digital information has produced massive amounts of time series data on rich features and most time series data are noisy and contain some outlier samples, which leads to a decline in the clustering effect. To efficiently discover the hidden statistical information about the data, a fast weighted fuzzy C-medoids clustering algorithm based on P-splines (PS-WFCMdd) is proposed for time series datasets in this study. Specifically, the P-spline method is used to fit the functional data related to the original time series data, and the obtained smooth-fitting data is used as the input of the clustering algorithm to enhance the ability to process the data set during the clustering process. Then, we define a new weighted method to further avoid the influence of outlier sample points in the weighted fuzzy C-medoids clustering process, to improve the robustness of our algorithm. We propose using the third version of mueen's algorithm for similarity search (MASS 3) to measure the similarity between time series quickly and accurately, to further improve the clustering efficiency. Our new algorithm is compared with several other time series clustering algorithms, and the performance of the algorithm is evaluated experimentally on different types of time series examples. The experimental results show that our new method can speed up data processing and the comprehensive performance of each clustering evaluation index are relatively good.
数字信息的快速增长产生了大量具有丰富特征的时间序列数据,而大多数时间序列数据都是嘈杂的,并包含一些异常样本,这导致聚类效果下降。为了有效地发现数据的隐藏统计信息,针对时间序列数据集,本文提出了一种基于 P-样条的快速加权模糊 C-均值聚类算法(PS-WFCMdd)。具体来说,使用 P-样条方法拟合与原始时间序列数据相关的函数型数据,并将获得的平滑拟合数据用作聚类算法的输入,以增强聚类过程中处理数据集的能力。然后,我们定义了一种新的加权方法,进一步避免了加权模糊 C-均值聚类过程中异常样本点的影响,提高了算法的稳健性。我们提出使用 mueen 算法的第三个版本(MASS 3)来快速准确地测量时间序列之间的相似性,以进一步提高聚类效率。我们将新算法与其他几种时间序列聚类算法进行了比较,并在不同类型的时间序列示例上进行了实验评估算法的性能。实验结果表明,我们的新方法可以加快数据处理速度,并且每个聚类评估指标的综合性能都相对较好。