Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, 9500 Euclid Ave, Cleveland, OH, 44195, USA.
PLMI's Center for Artificial Intelligence and Data Science, Cleveland Clinic, Cleveland, OH, USA.
Sci Rep. 2024 Oct 7;14(1):23312. doi: 10.1038/s41598-024-73608-0.
Healthcare data accessibility for machine learning (ML) is encumbered by a range of stringent regulations and limitations. Using synthetic data that mirrors the underlying properties in the real data is emerging as a promising solution to overcome these barriers. We propose a fully automated synthetic tabular neural generator (STNG), which comprises multiple synthetic data generators and integrates an Auto-ML module to validate and comprehensively compare the synthetic datasets generated from different approaches. An empirical study was conducted to demonstrate the performance of STNG using twelve different datasets. The results highlight STNG's robustness and its pivotal role in enhancing the accessibility of validated synthetic healthcare data, thereby offering a promising solution to a critical barrier in ML applications in healthcare.
医疗保健数据的机器学习(ML)访问受到一系列严格法规和限制的阻碍。使用反映真实数据中基本属性的合成数据正成为克服这些障碍的有前途的解决方案。我们提出了一种全自动合成表格神经生成器(STNG),它由多个合成数据生成器组成,并集成了一个 Auto-ML 模块,以验证和全面比较来自不同方法的合成数据集。进行了一项实证研究,使用十二个不同数据集来演示 STNG 的性能。结果突出了 STNG 的稳健性及其在增强经过验证的合成医疗保健数据的可访问性方面的关键作用,从而为医疗保健中的 ML 应用中的一个关键障碍提供了有前途的解决方案。