一种使用动态时间规整差异的用于时间序列的稳健交替最小二乘K均值聚类方法。

A robust alternating least squares K-means clustering approach for times series using dynamic time warping dissimilarities.

作者信息

Vera-Vera J Fernando, Roldán-Nofuentes J Antonio

机构信息

Department of Statistics and O.R., University of Granada, Faculty of Sciences, Fuentenueva s/n, 18071, Granada, Spain.

出版信息

Math Biosci Eng. 2024 Feb 6;21(3):3631-3651. doi: 10.3934/mbe.2024160.

DOI:10.3934/mbe.2024160

PMID:38549299

Abstract

Time series clustering is a usual task in many different areas. Algorithms such as K-means and model-based clustering procedures are used relating to multivariate assumptions on the datasets, as the consideration of Euclidean distances, or a probabilistic distribution of the observed variables. However, in many cases the observed time series are of unequal length and/or there is missing data or, simply, the time periods observed for the series are not comparable between them, which does not allow the direct application of these methods. In this framework, dynamic time warping is an advisable and well-known elastic dissimilarity procedure, in particular when the analysis is accomplished in terms of the shape of the time series. In relation to a dissimilarity matrix, K-means clustering can be performed using a particular procedure based on classical multidimensional scaling in full dimension, which can result in a clustering problem in high dimensionality for large sample sizes. In this paper, we propose a procedure robust to dimensionality reduction, based on an auxiliary configuration estimated from the squared dynamic time warping dissimilarities, using an alternating least squares procedure. The performance of the model is compared to that obtained using classical multidimensional scaling, as well as to that of model-based clustering using this related auxiliary linear projection. An extensive Monte Carlo procedure is employed to analyze the performance of the proposed method in which real and simulated datasets are considered. The results obtained indicate that the proposed K-means procedure, in general, slightly improves the one based on the classical configuration, both being robust in reduced dimensionality, making it advisable for large datasets. In contrast, model-based clustering in the classical projection is greatly affected by high dimensionality, offering worse results than K-means, even in reduced dimension.

摘要

时间序列聚类是许多不同领域中的常见任务。诸如K均值和基于模型的聚类程序等算法被用于与数据集的多变量假设相关的情况，例如考虑欧几里得距离或观测变量的概率分布。然而，在许多情况下，观测到的时间序列长度不等和/或存在缺失数据，或者简单地说，为这些序列观测的时间段之间不可比，这使得这些方法无法直接应用。在此框架下，动态时间规整是一种可取且广为人知的弹性差异度量程序，特别是当根据时间序列的形状进行分析时。关于差异矩阵，可以使用基于全维经典多维缩放的特定程序来执行K均值聚类，对于大样本量，这可能会导致高维聚类问题。在本文中，我们基于从平方动态时间规整差异估计的辅助配置，使用交替最小二乘法，提出了一种对降维具有鲁棒性的程序。将该模型的性能与使用经典多维缩放获得的性能进行比较，以及与使用这种相关辅助线性投影的基于模型的聚类的性能进行比较。采用广泛的蒙特卡罗程序来分析所提出方法在考虑真实和模拟数据集时的性能。获得的结果表明，所提出的K均值程序总体上比基于经典配置的程序略有改进，两者在降维时都具有鲁棒性，这使得它适用于大型数据集。相比之下，经典投影中的基于模型的聚类受到高维的极大影响，即使在降维时也比K均值提供更差的结果。

相似文献

A robust alternating least squares K-means clustering approach for times series using dynamic time warping dissimilarities.一种使用动态时间规整差异的用于时间序列的稳健交替最小二乘K均值聚类方法。

Math Biosci Eng. 2024 Feb 6;21(3):3631-3651. doi: 10.3934/mbe.2024160.

On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling.基于全多维标度的距离矩阵 K-均值聚类行为研究。

Psychometrika. 2021 Jun;86(2):489-513. doi: 10.1007/s11336-021-09757-2. Epub 2021 May 19.

Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.基于方差的 K-均值框架中单模不相似数据的聚类选择标准。

Psychometrika. 2017 Jun;82(2):275-294. doi: 10.1007/s11336-017-9561-1. Epub 2017 Feb 13.

Modified multidimensional scaling approach to analyze financial markets.用于分析金融市场的改进型多维缩放方法。

Chaos. 2014 Jun;24(2):022102. doi: 10.1063/1.4873523.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估，作为一种用于纵向医疗记录的相似性度量方法。

J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method.使用扩展K均值方法在特征丰富网络上进行社区划分

Entropy (Basel). 2022 Apr 29;24(5):626. doi: 10.3390/e24050626.

Somtimes: self organizing maps for time series clustering and its application to serious illness conversations.有时：用于时间序列聚类的自组织映射及其在重病对话中的应用

Data Min Knowl Discov. 2024;38(3):813-839. doi: 10.1007/s10618-023-00979-9. Epub 2023 Oct 20.

A -means method for trends of time series: An application to time series of COVID-19 cases in Japan.时间序列趋势的A-均值方法：在日本新冠肺炎病例时间序列中的应用

Jpn J Stat Data Sci. 2022;5(1):303-319. doi: 10.1007/s42081-022-00148-0. Epub 2022 Mar 3.

Identifying cell types from single-cell data based on similarities and dissimilarities between cells.基于细胞之间的相似性和差异性从单细胞数据中识别细胞类型。

BMC Bioinformatics. 2021 May 18;22(Suppl 3):255. doi: 10.1186/s12859-020-03873-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种使用动态时间规整差异的用于时间序列的稳健交替最小二乘K均值聚类方法。

A robust alternating least squares K-means clustering approach for times series using dynamic time warping dissimilarities.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献