Putrama I Made, Martinek Péter
Department of Electronics Technology, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary.
Department of Informatics, Faculty of Engineering and Vocational, Universitas Pendidikan Ganesha, Singaraja, Indonesia.
Data Brief. 2024 Aug 29;56:110853. doi: 10.1016/j.dib.2024.110853. eCollection 2024 Oct.
Integrating multiple data source technologies is essential for organizations to respond to highly dynamic market needs. Although physical data integration systems have been considered to have better query processing systems, they pose higher implementation and maintenance costs. Meanwhile, virtual data integration has become an alternative topic that is increasingly attracting the attention of researchers in the current era of big data. Various data integration methodologies have been developed and used in various domains, processing heterogeneous data using various approaches. This review article aims to provide an overview of heterogeneous data integration research focusing on methodology and approaches. It surveys existing publications, highlighting key trends, challenges, and open research topics. The main findings are: (i) Research has been conducted in various domains. However, most focus on big data rather than specific study domains; (ii) researchers primarily focus on semantics challenges, and (iii) gaps still need to be addressed and related to integration issues involving semantics and unstructured data formats that must be thoroughly addressed. Furthermore, considering elements of cutting-edge technology, such as machine learning and data integration, about privacy concerns provides a chance for additional investigation. Finally, we provide insight into the potential for a broader review of integration challenges based on case studies.
整合多种数据源技术对于组织应对高度动态的市场需求至关重要。尽管物理数据集成系统被认为具有更好的查询处理系统,但它们带来了更高的实施和维护成本。与此同时,在当前的大数据时代,虚拟数据集成已成为一个越来越吸引研究人员关注的替代话题。各种数据集成方法已经被开发出来并应用于各个领域,使用各种方法处理异构数据。这篇综述文章旨在概述聚焦于方法和途径的异构数据集成研究。它调查现有出版物,突出关键趋势、挑战和开放研究课题。主要发现如下:(i) 已经在各个领域开展了研究。然而,大多数研究集中在大数据而非特定研究领域;(ii) 研究人员主要关注语义挑战,以及(iii) 仍然需要解决与涉及语义和非结构化数据格式的集成问题相关的差距,这些问题必须得到彻底解决。此外,考虑到机器学习和数据集成等前沿技术要素中有关隐私问题,为进一步研究提供了机会。最后,我们基于案例研究对更广泛的集成挑战综述的潜力提供了见解。