Department of Electronic Technology, Escuela Técnica Superior de Ingeniería Informática, Universidad de Sevilla, 41012 Sevilla, Spain.
Department of Electronic Technology, Escuela Politécnica Superior, Universidad de Sevilla, 41011 Sevilla, Spain.
Sensors (Basel). 2023 Apr 9;23(8):3845. doi: 10.3390/s23083845.
Currently, in many data landscapes, the information is distributed across various sources and presented in diverse formats. This fragmentation can pose a significant challenge to the efficient application of analytical methods. In this sense, distributed data mining is mainly based on clustering or classification techniques, which are easier to implement in distributed environments. However, the solution to some problems is based on the usage of mathematical equations or stochastic models, which are more difficult to implement in distributed environments. Usually, these types of problems need to centralize the required information, and then a modelling technique is applied. In some environments, this centralization may cause an overloading of the communication channels due to massive data transmission and may also cause privacy issues when sending sensitive data. To mitigate this problem, this paper describes a general-purpose distributed analytic platform based on edge computing for distributed networks. Through the distributed analytical engine (DAE), the calculation process of the expressions (that requires data from diverse sources) is decomposed and distributed between the existing nodes, and this allows sending partial results without exchanging the original information. In this way, the master node ultimately obtains the result of the expressions. The proposed solution is examined using three different computational intelligence algorithms, i.e., genetic algorithm, genetic algorithm with evolution control, and particle swarm optimization, to decompose the expression to be calculated and to distribute the calculation tasks between the existing nodes. This engine has been successfully applied in a case study focused on the calculation of key performance indicators of a smart grid, achieving a reduction in the number of communication messages by more than 91% compared to the traditional approach.
目前,在许多数据环境中,信息分布在各个来源中,并以不同的格式呈现。这种碎片化给分析方法的有效应用带来了巨大挑战。在这种情况下,分布式数据挖掘主要基于聚类或分类技术,这些技术在分布式环境中更容易实现。然而,某些问题的解决方案基于使用数学方程或随机模型,这些在分布式环境中更难实现。通常,这些类型的问题需要集中所需的信息,然后应用建模技术。在某些环境中,这种集中化可能会由于大量数据传输而导致通信通道过载,并且在发送敏感数据时也可能会引发隐私问题。为了解决这个问题,本文描述了一种基于边缘计算的通用分布式分析平台,用于分布式网络。通过分布式分析引擎(DAE),将需要来自不同来源的数据的表达式的计算过程分解并分布在现有节点之间,从而允许发送部分结果而无需交换原始信息。这样,主节点最终可以获得表达式的结果。使用三种不同的计算智能算法(遗传算法、具有进化控制的遗传算法和粒子群优化算法)来检查所提出的解决方案,以分解要计算的表达式并在现有节点之间分配计算任务。该引擎已成功应用于一个侧重于计算智能算法的案例研究,与传统方法相比,通信消息数量减少了 91%以上。