Fusco Giuseppe, Aversano Lerina
Department of Engineering, University of Sannio, Benevento, BN, Italia.
PeerJ Comput Sci. 2020 Mar 2;6:e254. doi: 10.7717/peerj-cs.254. eCollection 2020.
Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view.
整合来自多个异构数据源的数据需要处理分布在异构信息源中的数据,这些数据源可以是结构化、半结构化或非结构化的,并为用户提供这些数据的统一视图。因此,一般来说,收集信息具有挑战性,主要原因之一是数据源旨在支持特定应用程序。在很大程度上,用户通常不知道它们的结构。此外,存储的数据往往是冗余的,与仅支持企业流程所需的信息混合在一起,并且在业务领域方面是不完整的。从异构和自治数据源收集、整合、协调和高效提取信息被视为一项重大挑战。在本文中,我们提出了一种用于异构数据源语义集成的方法——数据集成框架(DIF),以及一个支持复杂数据集成过程各个方面的软件原型。所提出的方法是基于本体对全局视图和局部视图方法的一种泛化。特别是,为了克服语义异构带来的问题并支持与外部系统的互操作性,本体被用作概念模式来表示要集成的数据源和全局视图。