Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Sankt Augustin, Germany.
Knowledge Management, ZB MED - Information Centre for Life Sciences, Cologne, Germany.
Stud Health Technol Inform. 2024 Aug 30;317:21-29. doi: 10.3233/SHTI240834.
Individual health data is crucial for scientific advancements, particularly in developing Artificial Intelligence (AI); however, sharing real patient information is often restricted due to privacy concerns. A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data, while preserving confidential patient information. In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health. First, two state-of-the-art AI tools (namely, VAMBN and MultiNODEs) for generating synthetic health data are outlined. Further, we introduce SYNDAT (a public web-based tool) which allows users to visualize and assess the quality and risk of synthetic data provided by desired generative models. Additionally, the utility of the proposed methods and the web-based tool is showcased using data from Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Center for Cancer Registry Data of the Robert Koch Institute (RKI).
个体健康数据对于科学进步至关重要,特别是在人工智能(AI)的发展中;然而,由于隐私问题,真实患者信息的共享往往受到限制。解决这一挑战的一个有前途的方法是合成数据生成。这项技术创建了全新的数据集,模拟真实数据的统计特性,同时保护患者的机密信息。在本文中,我们介绍了德国国家数据基础设施项目 NFDI4Health 背景下开发的工作流程和不同服务。首先,概述了用于生成合成健康数据的两种最先进的 AI 工具(即 VAMBN 和 MultiNODEs)。此外,我们引入了 SYNDAT(一个基于网络的公共工具),它允许用户可视化和评估所需生成模型提供的合成数据的质量和风险。此外,还使用来自阿尔茨海默病神经影像学倡议(ADNI)和罗伯特科赫研究所癌症登记数据中心(RKI)的数据展示了所提出的方法和基于网络的工具的实用性。