Zhang Yong, Sheng Ming, Liu Xingyue, Wang Ruoyu, Lin Weihang, Ren Peng, Wang Xia, Zhao Enlai, Song Wenchao
BNRist, DCST, RIIT, Tsinghua University, Beijing, 100084 China.
Beihang University, Beijing, 102206 China.
Health Inf Sci Syst. 2022 Aug 26;10(1):22. doi: 10.1007/s13755-022-00183-x. eCollection 2022 Dec.
Industry 4.0 era has witnessed that more and more high-tech and precise devices are applied into medical field to provide better services. Besides EMRs, medical data include a large amount of unstructured data such as X-rays, MRI scans, CT scans and PET scans, which is still continually increasing. These massive, heterogeneous multi-modal data bring the big challenge to finding valuable data sets for healthcare researchers and other users. The traditional data warehouses are able to integrate the data and support interactive data exploration through ETL process. However, they have high cost and are not real-time. Furthermore, they lack of the ability to deal with multi-modal data in two phases-data fusion and data exploration. In the data fusion phase, it is difficult to unify the multi-modal data under one data model. In the data exploration phase, it is challenging to explore the multi-modal data at the same time, which impedes the process of extracting the diverse information underlying multi-modal data. Therefore, in order to solve these problems, we propose a highly efficient data fusion framework supporting data exploration for heterogeneous multi-modal medical data based on data lake. This framework provides a novel and efficient method to fuse the fragmented multi-modal medical data and store their metadata in the data lake. It offers a user-friendly interface supporting hybrid graph queries to explore multi-modal data. Indexes are created to accelerate the hybrid data exploration. One prototype has been implemented and tested in a hospital, which demonstrates the effectiveness of our framework.
工业4.0时代见证了越来越多的高科技精密设备应用于医疗领域以提供更好的服务。除了电子病历,医疗数据还包括大量非结构化数据,如X光、核磁共振成像扫描、计算机断层扫描和正电子发射断层扫描,并且其数量仍在持续增长。这些海量、异构的多模态数据给医疗保健研究人员和其他用户寻找有价值的数据集带来了巨大挑战。传统数据仓库能够通过ETL过程集成数据并支持交互式数据探索。然而,它们成本高昂且不实时。此外,它们缺乏在数据融合和数据探索两个阶段处理多模态数据的能力。在数据融合阶段,难以将多模态数据统一在一个数据模型之下。在数据探索阶段,同时探索多模态数据具有挑战性,这阻碍了提取多模态数据背后各种信息的过程。因此,为了解决这些问题,我们提出了一种基于数据湖的高效数据融合框架,用于支持对异构多模态医疗数据进行数据探索。该框架提供了一种新颖且高效的方法来融合碎片化的多模态医疗数据,并将其元数据存储在数据湖中。它提供了一个支持混合图查询的用户友好界面来探索多模态数据。通过创建索引来加速混合数据探索。我们已经在一家医院实现并测试了一个原型,这证明了我们框架的有效性。