Suppr超能文献

一种基于湖仓架构的多源异构医学数据增强框架。

A multi-source heterogeneous medical data enhancement framework based on lakehouse.

作者信息

Sheng Ming, Wang Shuliang, Zhang Yong, Hao Rui, Liang Ye, Luo Yi, Yang Wenhan, Wang Jincheng, Li Yinan, Zheng Wenkui, Li Wenyao

机构信息

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081 China.

BNRist, DCST, RIIT, Tsinghua University, Beijing, 100084 China.

出版信息

Health Inf Sci Syst. 2024 Jul 5;12(1):37. doi: 10.1007/s13755-024-00295-6. eCollection 2024 Dec.

Abstract

Obtaining high-quality data sets from raw data is a key step before data exploration and analysis. Nowadays, in the medical domain, a large amount of data is in need of quality improvement before being used to analyze the health condition of patients. There have been many researches in data extraction, data cleaning and data imputation, respectively. However, there are seldom frameworks integrating with these three techniques, making the dataset suffer in accuracy, consistency and integrity. In this paper, a multi-source heterogeneous data enhancement framework based on a lakehouse MHDP is proposed, which includes three steps of data extraction, data cleaning and data imputation. In the data extraction step, a data fusion technique is offered to handle multi-modal and multi-source heterogeneous data. In the data cleaning step, we propose HoloCleanX, which provides a convenient interactive procedure. In the data imputation step, multiple imputation (MI) and the SOTA algorithm SAITS, are applied for different situations. We evaluate our framework via three tasks: clustering, classification and strategy prediction. The experimental results prove the effectiveness of our data enhancement framework.

摘要

从原始数据中获取高质量数据集是数据探索和分析之前的关键步骤。如今,在医学领域,大量数据在用于分析患者健康状况之前需要提高质量。分别在数据提取、数据清理和数据插补方面已经有很多研究。然而,很少有将这三种技术集成在一起的框架,这使得数据集在准确性、一致性和完整性方面受到影响。本文提出了一种基于湖仓MHDP的多源异构数据增强框架,它包括数据提取、数据清理和数据插补三个步骤。在数据提取步骤中,提供了一种数据融合技术来处理多模态和多源异构数据。在数据清理步骤中,我们提出了HoloCleanX,它提供了一个方便的交互式过程。在数据插补步骤中,针对不同情况应用了多重插补(MI)和最新算法SAITS。我们通过聚类、分类和策略预测这三个任务对我们的框架进行评估。实验结果证明了我们的数据增强框架的有效性。

相似文献

8
Patient navigator programmes for children and adolescents with chronic diseases.慢性病患儿和青少年的患者导航员计划。
Cochrane Database Syst Rev. 2024 Oct 9;10(10):CD014688. doi: 10.1002/14651858.CD014688.pub2.

本文引用的文献

3
Automatic breast lesion segmentation in phase preserved DCE-MRIs.相位保留动态对比增强磁共振成像中的乳腺病变自动分割
Health Inf Sci Syst. 2022 May 20;10(1):9. doi: 10.1007/s13755-022-00176-w. eCollection 2022 Dec.
4
Distributed Differential Evolution With Adaptive Resource Allocation.分布式差分进化算法与自适应资源分配
IEEE Trans Cybern. 2023 May;53(5):2791-2804. doi: 10.1109/TCYB.2022.3153964. Epub 2023 Apr 21.
5
Distributed Memetic Algorithm for Outsourced Database Fragmentation.分布式进化算法在外包数据库分片问题中的应用
IEEE Trans Cybern. 2021 Oct;51(10):4808-4821. doi: 10.1109/TCYB.2020.3027962. Epub 2021 Oct 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验