IEEE Trans Cybern. 2017 May;47(5):1198-1209. doi: 10.1109/TCYB.2016.2540657. Epub 2016 Mar 30.
It has been a challenge to find patterns in a time series of sensor data for fault detection in a system. Since it is usually not straightforward to discover meaningful features and rules directly from complex time series, data discretization has been popularly employed to reduce data size while preserving meaningful features from the original data, for which the choice of appropriate discretization parameters is crucial. We thus present a systematic discretization procedure of multivariate time series data that includes: 1) label definition in consideration of the estimated distribution functions of sensor signals and the trends of signal's short-term variation and 2) label specification to a set of time segments in order to describe the state of a given system for the time segment as a discretized state vector. Formal definitions of fault patterns and discretization problems are made to conduct empirical sensitivity analysis of discretization parameters in finding the most informative fault patterns. We then investigate the relationship between the parameters and the key characteristic indicators of sensor signals. The computational results with the ten real-world data sets provide a practical advice to select appropriate parameters.
在系统故障检测中,从传感器数据的时间序列中找到模式一直是一个挑战。由于直接从复杂的时间序列中发现有意义的特征和规则通常并不容易,因此数据离散化已被广泛用于缩小数据规模,同时保留原始数据中的有意义特征,其中合适的离散化参数的选择至关重要。因此,我们提出了一种系统的多元时间序列数据离散化方法,包括:1)标签定义,考虑传感器信号的估计分布函数以及信号短期变化的趋势,2)标签指定到一组时间段,以便描述给定系统在给定时间段的状态作为离散化状态向量。对故障模式和离散化问题进行了形式化定义,以对离散化参数进行实证敏感性分析,以找到最具信息量的故障模式。然后研究了参数与传感器信号关键特征指标之间的关系。十个真实数据集的计算结果为选择合适的参数提供了实用建议。