在用于临床服务的大数据分析平台中通过HBase使用分布式数据。

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services.

作者信息

Chrimes Dillon, Zamani Hamid

机构信息

Database Integration and Management, IMIT Quality Systems, Vancouver Island Health Authority, Vancouver, BC, Canada V8R 1J8.

School of Health Information Science, Faculty of Human and Social Development, University of Victoria, Victoria, BC, Canada V8P 5C2.

出版信息

Comput Math Methods Med. 2017;2017:6120820. doi: 10.1155/2017/6120820. Epub 2017 Dec 11.

DOI:10.1155/2017/6120820

PMID:29375652

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5742497/

Abstract

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.

摘要

大数据分析（BDA）对于降低医疗成本很重要。然而，在数据聚合、维护、集成、转换、分析以及安全/隐私方面存在诸多挑战。本研究旨在利用开源软件技术建立一个带有模拟患者数据的交互式BDA平台，通过使用HBase（键值型非关系型数据库）构建一个带有Hadoop分布式文件系统（HDFS）的平台框架来实现。分布式数据结构是从90亿条患者记录的特定医院基准元数据生成的。在优化迭代过程中，将HFiles从HDFS摄取到HBase存储文件在数百次迭代中显示出持续可用性；然而，完成从MapReduce到HBase的操作分别需要一周时间（对于10TB数据）和一个月时间（对于300亿条，即30TB的索引患者记录）。发现MapReduce的不一致性限制了高效生成和复制数据的能力。Apache Spark和Drill在技术支持方面表现出高可用性和高性能，但在临床服务方面可用性较差。基于以患者为中心的数据的医院系统在使用HBase时具有挑战性，因为并非所有数据配置文件都能与复杂的患者与医院关系完全整合。然而，我们建议在通过简化的临床事件模型跨临床服务查询整个医院数据量时，使用HBase来实现安全的患者数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d368/5742497/43ffbf3a8738/CMMM2017-6120820.001.jpg

相似文献

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services.在用于临床服务的大数据分析平台中通过HBase使用分布式数据。

Comput Math Methods Med. 2017;2017:6120820. doi: 10.1155/2017/6120820. Epub 2017 Dec 11.

Big health data for elderly employees job performance of SOEs: visionary and enticing challenges.国有企业老年员工工作绩效的大健康数据：富有远见且诱人的挑战。

Multimed Tools Appl. 2023 May 25:1-34. doi: 10.1007/s11042-023-15355-4.

Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud.云端医疗记录大数据访问与处理平台的实现

J Med Syst. 2017 Aug 18;41(10):149. doi: 10.1007/s10916-017-0777-5.

A Hadoop/MapReduce Based Platform for Supporting Health Big Data Analytics.一个基于Hadoop/MapReduce的支持健康大数据分析的平台。

Stud Health Technol Inform. 2019;257:229-235.

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用：现状与未来趋势。

BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.Hadoop/MapReduce/HBase 框架概述及其在生物信息学中的当前应用。

BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-11-S12-S1.

An Efficient Middle Layer Platform for Medical Imaging Archives.医学影像归档的高效中间层平台。

J Healthc Eng. 2018 Jun 21;2018:3984061. doi: 10.1155/2018/3984061. eCollection 2018.

Big Data Analytics in Medicine and Healthcare.医学与医疗保健中的大数据分析

J Integr Bioinform. 2018 May 10;15(3):20170030. doi: 10.1515/jib-2017-0030.

How can Big Data Analytics Support People-Centred and Integrated Health Services: A Scoping Review.大数据分析如何支持以人为主的综合健康服务：一项范围综述

Int J Integr Care. 2022 Jun 16;22(2):23. doi: 10.5334/ijic.5543. eCollection 2022 Apr-Jun.

Design and development of a medical big data processing system based on Hadoop.基于Hadoop的医学大数据处理系统的设计与开发。

J Med Syst. 2015 Mar;39(3):23. doi: 10.1007/s10916-015-0220-8. Epub 2015 Feb 10.

引用本文的文献

Digital twin: Data exploration, architecture, implementation and future.数字孪生：数据探索、架构、实现与未来。

Heliyon. 2024 Feb 21;10(5):e26503. doi: 10.1016/j.heliyon.2024.e26503. eCollection 2024 Mar 15.

Psychosocial Factors and Psychological Characteristics of Personality of Patients with Chronic Diseases Using Artificial Intelligence Data Mining Technology and Wireless Network Cloud Service Platform.利用人工智能数据挖掘技术和无线网络云服务平台的慢性病患者心理社会因素与人格心理特征。

Comput Intell Neurosci. 2022 Apr 13;2022:8418589. doi: 10.1155/2022/8418589. eCollection 2022.

Application of Big Data and Artificial Intelligence in COVID-19 Prevention, Diagnosis, Treatment and Management Decisions in China.大数据和人工智能在中国 COVID-19 预防、诊断、治疗和管理决策中的应用。

J Med Syst. 2021 Jul 24;45(9):84. doi: 10.1007/s10916-021-01757-0.

Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation.推动自然语言处理（NLP）以加速医疗人工智能发展的需求以及梅奥诊所的NLP即服务实施。

NPJ Digit Med. 2019 Dec 17;2:130. doi: 10.1038/s41746-019-0208-8. eCollection 2019.

本文引用的文献

Constellation: a tool for rapid, automated phenotype assignment of a highly polymorphic pharmacogene, , from whole-genome sequences.星座：一种用于从全基因组序列中对高度多态性药物基因进行快速、自动表型分配的工具。

NPJ Genom Med. 2016 Jan 13;1:15007. doi: 10.1038/npjgenmed.2015.7. eCollection 2016.

A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases.用于遗传疾病应急管理的26小时高灵敏度全基因组测序系统。

Genome Med. 2015 Sep 30;7:100. doi: 10.1186/s13073-015-0221-8.

Toward a Literature-Driven Definition of Big Data in Healthcare.迈向基于文献的医疗大数据定义。

Biomed Res Int. 2015;2015:639021. doi: 10.1155/2015/639021. Epub 2015 Jun 2.

Design and development of a medical big data processing system based on Hadoop.基于Hadoop的医学大数据处理系统的设计与开发。

J Med Syst. 2015 Mar;39(3):23. doi: 10.1007/s10916-015-0220-8. Epub 2015 Feb 10.

High dimensional biological data retrieval optimization with NoSQL technology.使用NoSQL技术进行高维生物数据检索优化

BMC Genomics. 2014;15 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2164-15-S8-S3. Epub 2014 Nov 13.

BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

Big data: survey, technologies, opportunities, and challenges.大数据：调查、技术、机遇与挑战。

ScientificWorldJournal. 2014;2014:712826. doi: 10.1155/2014/712826. Epub 2014 Jul 17.

"Big data" and the electronic health record.“大数据”与电子健康记录

Yearb Med Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003.

Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks. Contribution of the IMIA Primary Healthcare Working Group.医疗保健领域的大数据使用模式：一种应用于疫苗接种益处和风险评估的用例驱动方法。国际医学信息学会初级卫生保健工作组的贡献。

Yearb Med Inform. 2014 Aug 15;9(1):27-35. doi: 10.15265/IY-2014-0016.

Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives. Contribution of the IMIA Social Media Working Group.科学与医疗保健领域的大数据：近期文献综述与展望。IMIA社交媒体工作组的贡献

Yearb Med Inform. 2014 Aug 15;9(1):21-6. doi: 10.15265/IY-2014-0004.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在用于临床服务的大数据分析平台中通过HBase使用分布式数据。

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献