Cuesta-Frau David
Technological Institute of Informatics, Universitat Politècnica de València, 03801 Alcoi Campus, Spain.
Entropy (Basel). 2020 Apr 25;22(5):494. doi: 10.3390/e22050494.
Despite its widely tested and proven usefulness, there is still room for improvement in the basic permutation entropy (PE) algorithm, as several subsequent studies have demonstrated in recent years. Some of these new methods try to address the well-known PE weaknesses, such as its focus only on ordinal and not on amplitude information, and the possible detrimental impact of equal values found in subsequences. Other new methods address less specific weaknesses, such as the PE results' dependence on input parameter values, a common problem found in many entropy calculation methods. The lack of discriminating power among classes in some cases is also a generic problem when entropy measures are used for data series classification. This last problem is the one specifically addressed in the present study. Toward that purpose, the classification performance of the standard PE method was first assessed by conducting several time series classification tests over a varied and diverse set of data. Then, this performance was reassessed using a new Shannon Entropy normalisation scheme proposed in this paper: divide the relative frequencies in PE by the number of different ordinal patterns actually found in the time series, instead of by the theoretically expected number. According to the classification accuracy obtained, this last approach exhibited a higher class discriminating power. It was capable of finding significant differences in six out of seven experimental datasets-whereas the standard PE method only did in four-and it also had better classification accuracy. It can be concluded that using the additional information provided by the number of forbidden/found patterns, it is possible to achieve a higher discriminating power than using the classical PE normalisation method. The resulting algorithm is also very similar to that of PE and very easy to implement.
尽管基本排列熵(PE)算法已经经过广泛测试并证明了其有效性,但近年来的一些后续研究表明,该算法仍有改进空间。其中一些新方法试图解决PE算法的一些众所周知的弱点,例如它只关注顺序信息而不关注幅度信息,以及子序列中出现的相等值可能产生的不利影响。其他新方法则解决了一些不太具体的弱点,例如PE结果对输入参数值的依赖性,这是许多熵计算方法中常见的问题。在某些情况下,熵度量用于数据序列分类时,类间缺乏区分能力也是一个普遍问题。本研究专门解决的就是最后这个问题。为此,首先通过对各种不同数据集进行多个时间序列分类测试,评估了标准PE方法的分类性能。然后,使用本文提出的一种新的香农熵归一化方案重新评估了该性能:将PE中的相对频率除以时间序列中实际发现的不同顺序模式的数量,而不是理论上预期的数量。根据获得的分类准确率,后一种方法表现出更高的类区分能力。它能够在七个实验数据集中的六个中发现显著差异——而标准PE方法只在四个数据集中发现了显著差异——并且它还具有更好的分类准确率。可以得出结论,利用禁止/发现模式数量提供的额外信息,有可能获得比使用经典PE归一化方法更高的区分能力。由此产生的算法也与PE算法非常相似,并且非常易于实现。