• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

异构数据集成:挑战与机遇。

Heterogeneous data integration: Challenges and opportunities.

作者信息

Putrama I Made, Martinek Péter

机构信息

Department of Electronics Technology, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary.

Department of Informatics, Faculty of Engineering and Vocational, Universitas Pendidikan Ganesha, Singaraja, Indonesia.

出版信息

Data Brief. 2024 Aug 29;56:110853. doi: 10.1016/j.dib.2024.110853. eCollection 2024 Oct.

DOI:10.1016/j.dib.2024.110853
PMID:39286416
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11402636/
Abstract

Integrating multiple data source technologies is essential for organizations to respond to highly dynamic market needs. Although physical data integration systems have been considered to have better query processing systems, they pose higher implementation and maintenance costs. Meanwhile, virtual data integration has become an alternative topic that is increasingly attracting the attention of researchers in the current era of big data. Various data integration methodologies have been developed and used in various domains, processing heterogeneous data using various approaches. This review article aims to provide an overview of heterogeneous data integration research focusing on methodology and approaches. It surveys existing publications, highlighting key trends, challenges, and open research topics. The main findings are: (i) Research has been conducted in various domains. However, most focus on big data rather than specific study domains; (ii) researchers primarily focus on semantics challenges, and (iii) gaps still need to be addressed and related to integration issues involving semantics and unstructured data formats that must be thoroughly addressed. Furthermore, considering elements of cutting-edge technology, such as machine learning and data integration, about privacy concerns provides a chance for additional investigation. Finally, we provide insight into the potential for a broader review of integration challenges based on case studies.

摘要

整合多种数据源技术对于组织应对高度动态的市场需求至关重要。尽管物理数据集成系统被认为具有更好的查询处理系统,但它们带来了更高的实施和维护成本。与此同时,在当前的大数据时代,虚拟数据集成已成为一个越来越吸引研究人员关注的替代话题。各种数据集成方法已经被开发出来并应用于各个领域,使用各种方法处理异构数据。这篇综述文章旨在概述聚焦于方法和途径的异构数据集成研究。它调查现有出版物,突出关键趋势、挑战和开放研究课题。主要发现如下:(i) 已经在各个领域开展了研究。然而,大多数研究集中在大数据而非特定研究领域;(ii) 研究人员主要关注语义挑战,以及(iii) 仍然需要解决与涉及语义和非结构化数据格式的集成问题相关的差距,这些问题必须得到彻底解决。此外,考虑到机器学习和数据集成等前沿技术要素中有关隐私问题,为进一步研究提供了机会。最后,我们基于案例研究对更广泛的集成挑战综述的潜力提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/c20f8e2f6159/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/18b67a35edb4/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/732fbbc0dfa2/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/6fb50a81361a/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/c20f8e2f6159/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/18b67a35edb4/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/732fbbc0dfa2/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/6fb50a81361a/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eab1/11402636/c20f8e2f6159/gr4.jpg

相似文献

1
Heterogeneous data integration: Challenges and opportunities.异构数据集成:挑战与机遇。
Data Brief. 2024 Aug 29;56:110853. doi: 10.1016/j.dib.2024.110853. eCollection 2024 Oct.
2
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
3
ASAS-NANP symposium: mathematical modeling in animal nutrition-Making sense of big data and machine learning: how open-source code can advance training of animal scientists.ASAS-NANP 研讨会:动物营养中的数学建模——从大数据和机器学习中得出意义:开源代码如何促进动物科学家的培训。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad317.
4
Toward a view-oriented approach for aligning RDF-based biomedical repositories.迈向一种基于视图的方法来对齐基于RDF的生物医学知识库。
Methods Inf Med. 2015;54(1):50-5. doi: 10.3414/ME13-02-0020. Epub 2014 Apr 29.
5
Correlation Aware Relevance-Based Semantic Index for Clinical Big Data Repository.基于相关性感知的临床大数据知识库语义索引
J Imaging Inform Med. 2024 Oct;37(5):2597-2611. doi: 10.1007/s10278-024-01095-w. Epub 2024 Apr 23.
6
Semantic Integration and Enrichment of Heterogeneous Biological Databases.异构生物数据库的语义整合与丰富
Methods Mol Biol. 2019;1910:655-690. doi: 10.1007/978-1-4939-9074-0_22.
7
8
Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.预测性大数据分析:一项使用大规模、复杂、异构、不一致、多源和不完整观测数据对帕金森病的研究。
PLoS One. 2016 Aug 5;11(8):e0157077. doi: 10.1371/journal.pone.0157077. eCollection 2016.
9
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
10
How can Big Data Analytics Support People-Centred and Integrated Health Services: A Scoping Review.大数据分析如何支持以人为主的综合健康服务:一项范围综述
Int J Integr Care. 2022 Jun 16;22(2):23. doi: 10.5334/ijic.5543. eCollection 2022 Apr-Jun.

引用本文的文献

1
Multimodal data-driven approaches in retinal vein occlusion: A narrative review integrating machine learning and bioinformatics.视网膜静脉阻塞中多模态数据驱动方法:结合机器学习和生物信息学的叙述性综述
Adv Ophthalmol Pract Res. 2025 Jul 14;5(4):235-244. doi: 10.1016/j.aopr.2025.07.002. eCollection 2025 Nov-Dec.
2
Advancing Health Care With Digital Twins: Meta-Review of Applications and Implementation Challenges.利用数字孪生推动医疗保健发展:应用与实施挑战的元综述
J Med Internet Res. 2025 Feb 19;27:e69544. doi: 10.2196/69544.

本文引用的文献

1
Osteoarthritis Data Integration Portal (OsteoDIP): A web-based gene and non-coding RNA expression database.骨关节炎数据整合门户(OsteoDIP):一个基于网络的基因和非编码RNA表达数据库。
Osteoarthr Cartil Open. 2022 Jan 27;4(1):100237. doi: 10.1016/j.ocarto.2022.100237. eCollection 2022 Mar.
2
Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle.整合异质跨国数据用于基于代理的奶牛肠道甲烷随机森林预测。
J Dairy Sci. 2022 Jun;105(6):5124-5140. doi: 10.3168/jds.2021-20158. Epub 2022 Mar 26.
3
Big data integration enhancement based on attributes conditional dependency and similarity index method.
基于属性条件依赖和相似度指数方法的大数据集成增强。
Math Biosci Eng. 2021 Oct 11;18(6):8661-8682. doi: 10.3934/mbe.2021429.
4
An approach for semantic integration of heterogeneous data sources.一种异构数据源语义集成的方法。
PeerJ Comput Sci. 2020 Mar 2;6:e254. doi: 10.7717/peerj-cs.254. eCollection 2020.
5
The challenges in data integration - heterogeneity and complexity in clinical trials and patient registries of Systemic Lupus Erythematosus.数据集成面临的挑战 - 系统性红斑狼疮临床试验和患者登记处的异质性和复杂性。
BMC Med Res Methodol. 2020 Jun 24;20(1):164. doi: 10.1186/s12874-020-01057-0.
6
Prediction of drug-target interaction by integrating diverse heterogeneous information source with multiple kernel learning and clustering methods.通过整合多种异构信息源,采用多核学习和聚类方法预测药物-靶标相互作用。
Comput Biol Chem. 2019 Feb;78:460-467. doi: 10.1016/j.compbiolchem.2018.11.028. Epub 2018 Dec 2.
7
Heterogeneous data integration by tree-augmented naïve Bayes for protein-protein interactions prediction.基于树增强朴素贝叶斯的异质数据集成在蛋白质-蛋白质相互作用预测中的应用。
Proteomics. 2013 Jan;13(2):261-8. doi: 10.1002/pmic.201200326. Epub 2012 Dec 3.