Javed Ali, Rizzo Donna M, Lee Byung Suk, Gramling Robert
Department of Medicine, Stanford University, 300 Pasteur Dr, Stanford, CA 94305 USA.
Department of Computer Science, University of Vermont, Burlington, VT USA.
Data Min Knowl Discov. 2024;38(3):813-839. doi: 10.1007/s10618-023-00979-9. Epub 2023 Oct 20.
There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM's training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8 speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1 and 18 depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm's utility with complex, temporally sequenced natural language.
The online version contains supplementary material available at 10.1007/s10618-023-00979-9.
对于能够对大型时间序列数据进行聚类和分析的可扩展算法存在需求。科霍宁自组织映射(SOM)是一种用于聚类、可视化和降低复杂数据维度的无监督人工神经网络。与所有聚类方法一样,它需要一种输入数据(在本研究中为时间序列)之间的相似性度量。动态时间规整(DTW)就是这样一种度量,并且是在对齐时间序列时能够适应扭曲的顶级方法。尽管DTW在聚类中很受欢迎,但在实践中它受到限制,因为运行时复杂度与时间序列的长度呈二次关系。为了解决这个问题,我们提出了一种用于聚类时间序列的新自组织映射,称为SOMTimeS,它使用DTW作为距离度量。与其他基于DTW的聚类算法相比,该方法具有相似的准确性,但扩展性更好且运行速度更快。计算性能源于在SOM训练阶段对不必要的DTW计算进行修剪。为了进行比较,我们为K均值实现了类似的修剪策略,并将后者称为K-TimeS。SOMTimeS和K-TimeS分别修剪了总DTW计算的43%和50%。使用来自加州大学河滨分校分类存档的112个基准时间序列数据集评估了修剪效果、准确性、执行时间和可扩展性,结果表明,在相似准确性的情况下,SOMTimeS和K-TimeS平均分别加速了1.8倍,加速率在1到18之间,具体取决于数据集。我们还将SOMTimeS应用于一项关于患者与临床医生严重疾病对话的医疗保健研究,以证明该算法在复杂的、按时间顺序排列的自然语言中的实用性。
在线版本包含可在10.1007/s10618-023-00979-9获取的补充材料。