Yang Chao-Tung, Liu Jung-Chun, Chen Shuo-Tsung, Lu Hsin-Wen
Department of Computer Science, Tunghai University, Taichung, 40704, Taiwan, Republic of China.
J Med Syst. 2017 Aug 18;41(10):149. doi: 10.1007/s10916-017-0777-5.
Big Data analysis has become a key factor of being innovative and competitive. Along with population growth worldwide and the trend aging of population in developed countries, the rate of the national medical care usage has been increasing. Due to the fact that individual medical data are usually scattered in different institutions and their data formats are varied, to integrate those data that continue increasing is challenging. In order to have scalable load capacity for these data platforms, we must build them in good platform architecture. Some issues must be considered in order to use the cloud computing to quickly integrate big medical data into database for easy analyzing, searching, and filtering big data to obtain valuable information.This work builds a cloud storage system with HBase of Hadoop for storing and analyzing big data of medical records and improves the performance of importing data into database. The data of medical records are stored in HBase database platform for big data analysis. This system performs distributed computing on medical records data processing through Hadoop MapReduce programming, and to provide functions, including keyword search, data filtering, and basic statistics for HBase database. This system uses the Put with the single-threaded method and the CompleteBulkload mechanism to import medical data. From the experimental results, we find that when the file size is less than 300MB, the Put with single-threaded method is used and when the file size is larger than 300MB, the CompleteBulkload mechanism is used to improve the performance of data import into database. This system provides a web interface that allows users to search data, filter out meaningful information through the web, and analyze and convert data in suitable forms that will be helpful for medical staff and institutions.
大数据分析已成为创新和竞争的关键因素。随着全球人口增长以及发达国家人口老龄化趋势,国家医疗保健使用率一直在上升。由于个人医疗数据通常分散在不同机构且数据格式各异,整合这些不断增加的数据具有挑战性。为了使这些数据平台具备可扩展的负载能力,我们必须构建良好的平台架构。为了利用云计算将大量医疗数据快速集成到数据库中以便轻松分析、搜索和筛选大数据以获取有价值的信息,必须考虑一些问题。这项工作构建了一个基于Hadoop的HBase的云存储系统,用于存储和分析医疗记录大数据,并提高将数据导入数据库的性能。医疗记录数据存储在用于大数据分析的HBase数据库平台中。该系统通过Hadoop MapReduce编程对医疗记录数据处理进行分布式计算,并为HBase数据库提供包括关键词搜索、数据筛选和基本统计等功能。该系统使用单线程方法的Put和CompleteBulkload机制来导入医疗数据。从实验结果来看,我们发现当文件大小小于300MB时,使用单线程方法的Put,当文件大小大于300MB时,使用CompleteBulkload机制来提高数据导入数据库的性能。该系统提供了一个Web界面,允许用户搜索数据、通过网络筛选出有意义的信息,并以合适的形式分析和转换数据,这将对医护人员和机构有所帮助。