Dipartimento di Informatica-Scienza e Ingegneria, University of Bologna, Viale del Risorgimento 2, 40136 Bologna, Italy.
Sensors (Basel). 2021 Jun 17;21(12):4160. doi: 10.3390/s21124160.
Large amounts of georeferenced data streams arrive daily to stream processing systems. This is attributable to the overabundance of affordable IoT devices. In addition, interested practitioners desire to exploit Internet of Things (IoT) data streams for strategic decision-making purposes. However, mobility data are highly skewed and their arrival rates fluctuate. This nature poses an extra challenge on data stream processing systems, which are required in order to achieve pre-specified latency and accuracy goals. In this paper, we propose ApproxSSPS, which is a system for approximate processing of geo-referenced mobility data, at scale with quality of service guarantees. We focus on stateful aggregations (e.g., means, counts) and top-N queries. ApproxSSPS features a controller that interactively learns the latency statistics and calculates proper sampling rates to meet latency or/and accuracy targets. An overarching trait of ApproxSSPS is its ability to strike a plausible balance between latency and accuracy targets. We evaluate ApproxSSPS on Apache Spark Structured Streaming with real mobility data. We also compared ApproxSSPS against a state-of-the-art online adaptive processing system. Our extensive experiments prove that ApproxSSPS can fulfill latency and accuracy targets with varying sets of parameter configurations and load intensities (i.e., transient peaks in data loads versus slow arriving streams). Moreover, our results show that ApproxSSPS outperforms the baseline counterpart by significant magnitudes. In short, ApproxSSPS is a novel spatial data stream processing system that can deliver real accurate results in a timely manner, by dynamically specifying the limits on data samples.
大量的地理位置数据流每天都会到达流处理系统。这归因于大量价格合理的物联网设备。此外,有兴趣的从业者希望利用物联网数据流进行战略决策。然而,移动数据高度偏斜,其到达率波动较大。这一性质对流处理系统提出了额外的挑战,这些系统需要满足预定的延迟和准确性目标。在本文中,我们提出了 ApproxSSPS,这是一个用于大规模处理具有服务质量保证的地理位置移动数据的近似系统。我们专注于有状态聚合(例如,均值、计数)和 top-N 查询。ApproxSSPS 的特点是控制器能够交互学习延迟统计信息,并计算适当的采样率以满足延迟或/和准确性目标。ApproxSSPS 的一个总体特点是它能够在延迟和准确性目标之间取得合理的平衡。我们使用真实的移动数据在 Apache Spark 结构化流上评估 ApproxSSPS。我们还将 ApproxSSPS 与最先进的在线自适应处理系统进行了比较。我们的广泛实验证明,ApproxSSPS 可以在不同的参数配置和负载强度(即数据负载的瞬时峰值与缓慢到达的流)下满足延迟和准确性目标。此外,我们的结果表明,ApproxSSPS 显著优于基线对应物。简而言之,ApproxSSPS 是一种新颖的空间数据流处理系统,它可以通过动态指定数据样本的限制,及时提供真正准确的结果。