• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于生物信息学应用开发异构数据管理系统的开源软件综述。

Review of open-source software for developing heterogeneous data management systems for bioinformatics applications.

作者信息

Silva Danilo, Moir Monika, Dunaiski Marcel, Blanco Natalia, Murtala-Ibrahim Fati, Baxter Cheryl, de Oliveira Tulio, Xavier Joicymara S

机构信息

Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, 7602, South Africa.

Computer Science Division, Department of Mathematical Sciences, Faculty of Science, Stellenbosch University, Stellenbosch, 7602, South Africa.

出版信息

Bioinform Adv. 2025 Jul 18;5(1):vbaf168. doi: 10.1093/bioadv/vbaf168. eCollection 2025.

DOI:10.1093/bioadv/vbaf168
PMID:40761326
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12321290/
Abstract

SUMMARY

In a world where data drive effective decision-making, bioinformatics and health science researchers often encounter difficulties managing data efficiently. In these fields, data are typically diverse in format and subject. Consequently, challenges in storing, tracking, and responsibly sharing valuable data have become increasingly evident over the past decades. To address the complexities, some approaches have leveraged standard strategies, such as using non-relational databases and data warehouses. However, these approaches often fall short in providing the flexibility and scalability required for complex projects. While the data lake paradigm has emerged to offer flexibility and handle large volumes of diverse data, it lacks robust data governance and organization. The data lakehouse is a new paradigm that combines the flexibility of a data lake with the governance of a data warehouse, offering a promising solution for managing heterogeneous data in bioinformatics. However, the lakehouse model remains unexplored in bioinformatics, with limited discussion in the current literature. In this study, we review strategies and tools for developing a data lakehouse infrastructure tailored to bioinformatics research. We summarize key concepts and assess available open-source and commercial solutions for managing data in bioinformatics.

AVAILABILITY AND IMPLEMENTATION

Not applicable.

摘要

摘要

在一个数据驱动有效决策的世界里,生物信息学和健康科学研究人员在有效管理数据方面常常遇到困难。在这些领域,数据通常在格式和主题上多种多样。因此,在过去几十年里,存储、跟踪和负责任地共享有价值数据方面的挑战日益明显。为了应对这些复杂性,一些方法利用了标准策略,比如使用非关系型数据库和数据仓库。然而,这些方法在提供复杂项目所需的灵活性和可扩展性方面往往不足。虽然数据湖范式已经出现,以提供灵活性并处理大量多样的数据,但它缺乏强大的数据治理和组织。数据湖仓是一种新范式,它将数据湖的灵活性与数据仓库的治理相结合,为生物信息学中管理异构数据提供了一个有前景的解决方案。然而,湖仓模型在生物信息学中仍未得到探索,当前文献中的讨论也很有限。在本研究中,我们回顾了为生物信息学研究量身定制的数据湖仓基础设施的开发策略和工具。我们总结了关键概念,并评估了用于生物信息学数据管理的可用开源和商业解决方案。

可用性与实施

不适用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/6310fb82cffc/vbaf168f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/a0e2675994b7/vbaf168f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/40f144558216/vbaf168f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/164c5af43c1b/vbaf168f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/6310fb82cffc/vbaf168f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/a0e2675994b7/vbaf168f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/40f144558216/vbaf168f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/164c5af43c1b/vbaf168f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d975/12321290/6310fb82cffc/vbaf168f4.jpg

相似文献

1
Review of open-source software for developing heterogeneous data management systems for bioinformatics applications.用于生物信息学应用开发异构数据管理系统的开源软件综述。
Bioinform Adv. 2025 Jul 18;5(1):vbaf168. doi: 10.1093/bioadv/vbaf168. eCollection 2025.
2
Short-Term Memory Impairment短期记忆障碍
3
Enhancing Clinical Data Infrastructure for AI Research: Comparative Evaluation of Data Management Architectures.增强用于人工智能研究的临床数据基础设施:数据管理架构的比较评估
J Med Internet Res. 2025 Aug 1;27:e74976. doi: 10.2196/74976.
4
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
5
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.预防、检测和管理产后出血的认知和经验:定性证据综合。
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
6
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.
7
Interventions to improve safe and effective medicines use by consumers: an overview of systematic reviews.改善消费者安全有效用药的干预措施:系统评价概述
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):CD007768. doi: 10.1002/14651858.CD007768.pub3.
8
Conservative, physical and surgical interventions for managing faecal incontinence and constipation in adults with central neurological diseases.保守治疗、物理治疗和手术干预用于治疗伴有中枢神经系统疾病的成年人的粪便失禁和便秘。
Cochrane Database Syst Rev. 2024 Oct 29;10(10):CD002115. doi: 10.1002/14651858.CD002115.pub6.
9
The use of Open Dialogue in Trauma Informed Care services for mental health consumers and their family networks: A scoping review.创伤知情护理服务中使用开放对话模式为心理健康消费者及其家庭网络提供服务:范围综述。
J Psychiatr Ment Health Nurs. 2024 Aug;31(4):681-698. doi: 10.1111/jpm.13023. Epub 2024 Jan 17.
10
Systemic Inflammatory Response Syndrome全身炎症反应综合征

本文引用的文献

1
Managing and assembling population-scale data streams, tools and workflows to plan for future pandemics within the INFORM-Africa Consortium.在“非洲信息行动”(INFORM-Africa)联盟内管理和整合大规模人口数据流、工具及工作流程,以规划应对未来的大流行病。
S Afr J Sci. 2023 May-Jun;119(5-6). doi: 10.17159/sajs.2023/14659. Epub 2023 May 30.
2
A stream processing abstraction framework.一个流处理抽象框架。
Front Big Data. 2023 Oct 25;6:1227156. doi: 10.3389/fdata.2023.1227156. eCollection 2023.
3
Introducing the FAIR Principles for research software.
提出研究软件的 FAIR 原则。
Sci Data. 2022 Oct 14;9(1):622. doi: 10.1038/s41597-022-01710-x.
4
Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines.数据密集型管道的 Spark 流反向压力性能评估分析。
Sensors (Basel). 2022 Jun 23;22(13):4756. doi: 10.3390/s22134756.
5
SPEAR: Dynamic Spatio-Temporal Query Processing over High Velocity Data Streams.SPEAR:高速数据流上的动态时空查询处理
Proc Int Conf Data Eng. 2021 Apr;2021:2279-2284. doi: 10.1109/icde51399.2021.00237. Epub 2021 Jun 22.
6
Development of near real-time wireless image sequence streaming cloud using Apache Kafka for road traffic monitoring application.利用 Apache Kafka 开发用于道路交通监控应用的近实时无线图像序列流云计算。
PLoS One. 2022 Mar 17;17(3):e0264923. doi: 10.1371/journal.pone.0264923. eCollection 2022.
7
International federation of genomic medicine databases using GA4GH standards.使用全球基因组与健康联盟(GA4GH)标准的国际基因组医学数据库联合会。
Cell Genom. 2021 Nov 10;1(2). doi: 10.1016/j.xgen.2021.100032.
8
GA4GH: International policies and standards for data sharing across genomic research and healthcare.全球基因组与健康联盟(GA4GH):跨基因组研究与医疗保健领域数据共享的国际政策与标准。
Cell Genom. 2021 Nov 10;1(2). doi: 10.1016/j.xgen.2021.100029.
9
An atomic approach to the design and implementation of a research data warehouse.原子方法在研究数据仓库的设计与实现中的应用。
J Am Med Inform Assoc. 2022 Mar 15;29(4):601-608. doi: 10.1093/jamia/ocab204.
10
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy.FASTA/Q 数据压缩器在 MapReduce-Hadoop 基因组学中的应用:轻松节省空间和时间。
BMC Bioinformatics. 2021 Mar 22;22(1):144. doi: 10.1186/s12859-021-04063-1.