Suppr超能文献

在用于临床服务的大数据分析平台中通过HBase使用分布式数据。

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services.

作者信息

Chrimes Dillon, Zamani Hamid

机构信息

Database Integration and Management, IMIT Quality Systems, Vancouver Island Health Authority, Vancouver, BC, Canada V8R 1J8.

School of Health Information Science, Faculty of Human and Social Development, University of Victoria, Victoria, BC, Canada V8P 5C2.

出版信息

Comput Math Methods Med. 2017;2017:6120820. doi: 10.1155/2017/6120820. Epub 2017 Dec 11.

Abstract

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.

摘要

大数据分析(BDA)对于降低医疗成本很重要。然而,在数据聚合、维护、集成、转换、分析以及安全/隐私方面存在诸多挑战。本研究旨在利用开源软件技术建立一个带有模拟患者数据的交互式BDA平台,通过使用HBase(键值型非关系型数据库)构建一个带有Hadoop分布式文件系统(HDFS)的平台框架来实现。分布式数据结构是从90亿条患者记录的特定医院基准元数据生成的。在优化迭代过程中,将HFiles从HDFS摄取到HBase存储文件在数百次迭代中显示出持续可用性;然而,完成从MapReduce到HBase的操作分别需要一周时间(对于10TB数据)和一个月时间(对于300亿条,即30TB的索引患者记录)。发现MapReduce的不一致性限制了高效生成和复制数据的能力。Apache Spark和Drill在技术支持方面表现出高可用性和高性能,但在临床服务方面可用性较差。基于以患者为中心的数据的医院系统在使用HBase时具有挑战性,因为并非所有数据配置文件都能与复杂的患者与医院关系完全整合。然而,我们建议在通过简化的临床事件模型跨临床服务查询整个医院数据量时,使用HBase来实现安全的患者数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d368/5742497/43ffbf3a8738/CMMM2017-6120820.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验