Xiao Fuyuan, Aritsugi Masayoshi, Wang Qing, Zhang Rong
School of Computer and Information Science, Southwest University, No. 2 Tiansheng Road, BeiBei District, Chongqing 400715, PR China.
Big Data Science and Technology, Division of Environmental Science, Faculty of Advanced Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan.
Artif Intell Med. 2016 Sep;72:56-71. doi: 10.1016/j.artmed.2016.08.002. Epub 2016 Aug 19.
For efficient and sophisticated analysis of complex event patterns that appear in streams of big data from health care information systems and support for decision-making, a triaxial hierarchical model is proposed in this paper.
Our triaxial hierarchical model is developed by focusing on hierarchies among nested event pattern queries with an event concept hierarchy, thereby allowing us to identify the relationships among the expressions and sub-expressions of the queries extensively. We devise a cost-based heuristic by means of the triaxial hierarchical model to find an optimised query execution plan in terms of the costs of both the operators and the communications between them. According to the triaxial hierarchical model, we can also calculate how to reuse the results of the common sub-expressions in multiple queries. By integrating the optimised query execution plan with the reuse schemes, a multi-query optimisation strategy is developed to accomplish efficient processing of multiple nested event pattern queries.
We present empirical studies in which the performance of multi-query optimisation strategy was examined under various stream input rates and workloads. Specifically, the workloads of pattern queries can be used for supporting monitoring patients' conditions. On the other hand, experiments with varying input rates of streams can correspond to changes of the numbers of patients that a system should manage, whereas burst input rates can correspond to changes of rushes of patients to be taken care of. The experimental results have shown that, in Workload 1, our proposal can improve about 4 and 2 times throughput comparing with the relative works, respectively; in Workload 2, our proposal can improve about 3 and 2 times throughput comparing with the relative works, respectively; in Workload 3, our proposal can improve about 6 times throughput comparing with the relative work.
The experimental results demonstrated that our proposal was able to process complex queries efficiently which can support health information systems and further decision-making.
为了对医疗信息系统大数据流中出现的复杂事件模式进行高效、精细的分析,并支持决策制定,本文提出了一种三轴层次模型。
我们的三轴层次模型是通过关注具有事件概念层次结构的嵌套事件模式查询之间的层次关系来开发的,从而使我们能够广泛地识别查询的表达式和子表达式之间的关系。我们借助三轴层次模型设计了一种基于成本的启发式方法,以根据运算符及其之间通信的成本找到优化的查询执行计划。根据三轴层次模型,我们还可以计算如何在多个查询中重用公共子表达式的结果。通过将优化的查询执行计划与重用方案相结合,开发了一种多查询优化策略,以实现对多个嵌套事件模式查询的高效处理。
我们进行了实证研究,在各种流输入速率和工作负载下检验了多查询优化策略的性能。具体而言,模式查询的工作负载可用于支持监测患者状况。另一方面,流输入速率变化的实验可对应于系统应管理的患者数量的变化,而突发输入速率可对应于需要照顾的患者激增的变化。实验结果表明,在工作负载1中,与相关工作相比,我们的方案分别可将吞吐量提高约4倍和2倍;在工作负载2中,与相关工作相比,我们的方案分别可将吞吐量提高约3倍和2倍;在工作负载3中,与相关工作相比,我们的方案可将吞吐量提高约6倍。
实验结果表明,我们的方案能够高效处理复杂查询,可支持健康信息系统及进一步的决策制定。