Suppr超能文献

用于大数据平台上异构数据实时分析的分布式流处理中间件框架:以环境监测为例。

A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring.

机构信息

Centre for Sustainable Smart Cities, Central University of Technology, Free State 9300, South Africa.

出版信息

Sensors (Basel). 2020 Jun 3;20(11):3166. doi: 10.3390/s20113166.

Abstract

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical 'batch' processing-extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache topics using Connect APIs for processing by the streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.

摘要

近年来,物联网(IoT)技术的应用和广泛采用增加了监测系统的普及,从而使生成的异构数据量呈指数级增长。处理和分析大量生成的数据是繁琐的,并且正在逐渐从经典的“批处理”(提取、转换、加载(ETL))技术向实时处理转移。例如,在环境监测和管理领域,时间序列数据和历史数据集对于预测模型至关重要。然而,环境监测领域仍在使用传统系统,这使得对关键数据的实时分析、与大数据平台的集成以及对批处理的依赖变得复杂。在此,作为一种解决方案,提出了一种用于实时分析异构环境监测和管理数据的分布式流处理中间件框架,并在大数据环境中使用开源技术在集群上进行了测试。该系统使用 Connect API 从遗留系统和来自异构自动化气象系统的传感器数据摄取数据集,将其作为 Apache 主题中的数据,以进行处理由流处理引擎。流处理引擎执行以事件处理(EP)语言表示的预测数值模型和算法,以实时分析数据流。为了证明所提出框架的可行性,我们使用基于有效干旱指数(EDI)模型的干旱预测和预报案例研究场景来实现系统。首先,我们将预测模型转换为可由流引擎执行的形式,以便实时计算。其次,将模型应用于摄取的数据流和数据集,通过对无限流进行持久查询来检测异常,从而预测干旱。作为本研究的结论,计算了分布式流处理中间件基础架构的性能评估,以确定框架的实时有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验