Suppr超能文献

FlexSketch:平稳和非平稳数据流的概率密度估计

FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams.

作者信息

Park Namuk, Kim Songkuk

机构信息

School of Integrated Technology, Yonsei University, Incheon 21983, Korea.

出版信息

Sensors (Basel). 2021 Feb 4;21(4):1080. doi: 10.3390/s21041080.

Abstract

Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams.

摘要

在许多传感器系统中,高效且准确地估计数据流的概率分布是一个重要问题。当数据流是非平稳的,即其概率分布随时间变化时,这一问题尤其具有挑战性。非平稳数据流的统计模型需要在容忍时间波动的同时,对概念漂移进行灵活适应。为此,统计模型需要遗忘旧的数据样本并迅速检测概念漂移。在本文中,我们提出了FlexSketch,一种用于数据流的在线概率密度估计算法。我们的算法使用一组直方图,每个直方图代表不同长度的数据历史。FlexSketch为新的数据样本更新每个直方图,并通过组合直方图组来生成概率分布,同时定期监测近期数据与现有模型之间的差异。当检测到概念漂移时,会向直方图组中添加一个新的直方图,并移除最旧的直方图。这使我们能够仅使用有限的内存,以高更新速度和高精度估计概率密度函数。实验结果表明,与现有方法相比,我们的算法在平稳和非平稳数据流上均具有更高的速度和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e204/7915800/e1cfb255246f/sensors-21-01080-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验