Nasimov Rashid, Nasimova Nigorakhon, Mirzakhalilov Sanjar, Tokdemir Gul, Rizwan Mohammad, Abdusalomov Akmalbek, Cho Young-Im
Artificial Intelligence, Tashkent State University of Economics, Tashkent 100066, Uzbekistan.
Department of Software Information Technologies, Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi, Tashkent 100200, Uzbekistan.
Bioengineering (Basel). 2024 Dec 18;11(12):1288. doi: 10.3390/bioengineering11121288.
The generation of synthetic medical data has become a focal point for researchers, driven by the increasing demand for privacy-preserving solutions. While existing generative methods heavily rely on real datasets for training, access to such data is often restricted. In contrast, statistical information about these datasets is more readily available, yet current methods struggle to generate tabular data solely from statistical inputs. This study addresses the gaps by introducing a novel approach that converts statistical data into tabular datasets using a modified Generative Adversarial Network (GAN) architecture. A custom loss function was incorporated into the training process to enhance the quality of the generated data. The proposed method is evaluated using fidelity and utility metrics, achieving "Good" similarity and "Excellent" utility scores. While the generated data may not fully replace real databases, it demonstrates satisfactory performance for training machine-learning algorithms. This work provides a promising solution for synthetic data generation when real datasets are inaccessible, with potential applications in medical data privacy and beyond.
随着对隐私保护解决方案的需求不断增加,合成医学数据的生成已成为研究人员关注的焦点。虽然现有的生成方法严重依赖真实数据集进行训练,但获取此类数据往往受到限制。相比之下,关于这些数据集的统计信息更容易获得,然而当前的方法难以仅从统计输入生成表格数据。本研究通过引入一种新颖的方法来填补这些空白,该方法使用改进的生成对抗网络(GAN)架构将统计数据转换为表格数据集。在训练过程中纳入了一个自定义损失函数,以提高生成数据的质量。使用保真度和效用指标对所提出的方法进行评估,获得了“良好”的相似度和“优秀”的效用分数。虽然生成的数据可能无法完全取代真实数据库,但它在训练机器学习算法方面表现出令人满意的性能。这项工作为在无法访问真实数据集时生成合成数据提供了一个有前景的解决方案,在医学数据隐私及其他领域具有潜在应用。