Mihailović Dragutin T, Nikolić-Đorić Emilija, Malinović-Milićević Slavica, Singh Vijay P, Mihailović Anja, Stošić Tatijana, Stošić Borko, Drešković Nusret
Faculty of Agriculture, University of Novi Sad, Dositej Obradovic Sq. 8, 21000 Novi Sad, Serbia.
ACIMSI-Center for Meteorology and Environmental Modeling, University of Novi Sad, Dositej Obradovic Sq. 7, 21000 Novi Sad, Serbia.
Entropy (Basel). 2019 Feb 23;21(2):215. doi: 10.3390/e21020215.
The purpose of this paper was to choose an appropriate information dissimilarity measure for hierarchical clustering of daily streamflow discharge data, from twelve gauging stations on the Brazos River in Texas (USA), for the period 1989-2016. For that purpose, we selected and compared the average-linkage clustering hierarchical algorithm based on the compression-based dissimilarity measure (NCD), permutation distribution dissimilarity measure (PDDM), and Kolmogorov distance (KD). The algorithm was also compared with K-means clustering based on Kolmogorov complexity (KC), the highest value of Kolmogorov complexity spectrum (KCM), and the largest Lyapunov exponent (LLE). Using a dissimilarity matrix based on NCD, PDDM, and KD for daily streamflow, the agglomerative average-linkage hierarchical algorithm was applied. The key findings of this study are that: (i) The KD clustering algorithm is the most suitable among others; (ii) ANOVA analysis shows that there exist highly significant differences between mean values of four clusters, confirming that the choice of the number of clusters was suitably done; and (iii) from the clustering we found that the predictability of streamflow data of the Brazos River given by the Lyapunov time (LT), corrected for randomness by Kolmogorov time (KT) in days, lies in the interval from two to five days.
本文的目的是为1989 - 2016年期间美国得克萨斯州布拉索斯河上12个测量站的日流量排放数据的层次聚类选择一种合适的信息差异度量。为此,我们选择并比较了基于基于压缩的差异度量(NCD)、排列分布差异度量(PDDM)和柯尔莫哥洛夫距离(KD)的平均连锁聚类层次算法。该算法还与基于柯尔莫哥洛夫复杂度(KC)、柯尔莫哥洛夫复杂度谱的最高值(KCM)和最大李雅普诺夫指数(LLE)的K均值聚类进行了比较。使用基于NCD、PDDM和KD的日流量差异矩阵,应用凝聚平均连锁层次算法。本研究的主要发现是:(i)KD聚类算法在其他算法中是最合适的;(ii)方差分析表明四个聚类的平均值之间存在高度显著差异,证实聚类数的选择是合适的;(iii)从聚类中我们发现,经柯尔莫哥洛夫时间(KT)以天为单位校正随机性后的李雅普诺夫时间(LT)给出的布拉索斯河流量数据的可预测性在2至5天的区间内。