嵌入维数对心律失常识别中复杂性度量的影响。

Effect of embedding dimension on complexity measures in identifying Arrhythmia.

作者信息

Udhayakumar Radhagayathri K, Karmakar Chandan, Palaniswami Marimuthu

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:6230-6233. doi: 10.1109/EMBC.2016.7592152.

DOI:10.1109/EMBC.2016.7592152

Abstract

Entropy measures like Approximate entropy (ApEn) and Sample entropy (SampEn) are well established tools to analyze Heart Rate Variability (HRV) data. Critical parameters involved in these computations namely embedding dimension m and tolerance r are in most cases assumed to be 2 and 0.2*signal SD (standard devaition) respectively. Such assumptions do not work fairly across data sets and thus create misleading results in many cases. Problems with r have been addressed with the advent of newer entropy measures like Permutation entropy (PE), Fuzzy entropy (FuzzyEn) and Distribution entropy (DistEn) that simply eliminate, modify or replace r from calculations. On the other hand, the disadvantage of using a fixed assumed choice of m when such measures are used for data classification is yet to be investigated. The smallest variation in m may effect the extent of information retrieval from HRV data and hence it is extremely important to analyze different possibilities and outcomes of the same. In this study, we scrutinize the behavior of different entropy measures with regard to their classification performance at four different values of embedding dimension i.e., m = 2, 3,4 and 5. Normal and Arrhythmic RR intervals taken at data lengths ranging from 50 to 1000 have been used for the purpose. At any choice of m, DistEn and PE are the best measures to classify Arrhythmic data, whose AUC (Area under the ROC curve) values can go as high as 0.94 and 1 respectively. However PE performance becomes unstable with N for m > 3 (highest Δ being 0.3 at m = 5, Δ being the difference between minimum and maximum AUC). Irrespective of the choice of m, DistEn performance remains the most efficient and stable (highest Δ being only 0.03 at m = 4) for Arrhythmia classification. In the case of all other entropy measures, it is recommended that the value of m be chosen with discretion to ensure stability and efficiency in classification performance.

摘要

诸如近似熵（ApEn）和样本熵（SampEn）之类的熵度量是分析心率变异性（HRV）数据的成熟工具。这些计算中涉及的关键参数，即嵌入维度m和容忍度r，在大多数情况下分别假定为2和0.2×信号标准差（SD）。这样的假设在不同数据集上并不完全适用，因此在许多情况下会产生误导性结果。随着诸如排列熵（PE）、模糊熵（FuzzyEn）和分布熵（DistEn）等更新的熵度量的出现，r的问题已得到解决，这些新度量在计算中简单地消除、修改或替换了r。另一方面，当这些度量用于数据分类时，使用固定的假定m值的缺点尚未得到研究。m的最小变化可能会影响从HRV数据中检索信息的程度，因此分析其不同可能性和结果极其重要。在本研究中，我们在四个不同的嵌入维度值（即m = 2、3、4和5）下，仔细研究了不同熵度量的分类性能行为。为此使用了数据长度在50到1000之间的正常和心律失常RR间期。在任何m的选择下，DistEn和PE是对心律失常数据进行分类的最佳度量，其ROC曲线下面积（AUC）值分别可高达0.94和1。然而，当m > 3时，PE的性能随N变得不稳定（在m = 5时最大差异Δ为0.3，Δ为最小和最大AUC之间的差值）。对于心律失常分类，无论m如何选择，DistEn的性能仍然是最有效和稳定的（在m = 4时最大差异Δ仅为0.03）。对于所有其他熵度量，建议谨慎选择m值，以确保分类性能的稳定性和效率。