Department of Statistics, Purdue University, West Lafayette, IN, United States of America.
Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
PLoS One. 2023 Jul 6;18(7):e0280316. doi: 10.1371/journal.pone.0280316. eCollection 2023.
Clinical data sharing can facilitate data-driven scientific research, allowing a broader range of questions to be addressed and thereby leading to greater understanding and innovation. However, sharing biomedical data can put sensitive personal information at risk. This is usually addressed by data anonymization, which is a slow and expensive process. An alternative to anonymization is construction of a synthetic dataset that behaves similar to the real clinical data but preserves patient privacy. As part of a collaboration between Novartis and the Oxford Big Data Institute, a synthetic dataset was generated based on images from COSENTYX® (secukinumab) ankylosing spondylitis (AS) clinical studies. An auxiliary classifier Generative Adversarial Network (ac-GAN) was trained to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs), conditioned on the VU location (cervical, thoracic and lumbar). Here, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties along three key metrics: image fidelity, sample diversity and dataset privacy.
临床数据共享可以促进以数据为驱动的科学研究,能够解决更广泛的问题,从而加深对疾病的理解并带来创新。然而,分享生物医学数据可能会使敏感的个人信息面临风险。通常通过数据匿名化来解决这个问题,这是一个缓慢且昂贵的过程。替代匿名化的方法是构建一个类似真实临床数据但同时保护患者隐私的合成数据集。作为诺华公司与牛津大数据研究所合作的一部分,基于 COSENTYX®(司库奇尤单抗)治疗强直性脊柱炎(AS)的临床研究图像,生成了一个合成数据集。辅助分类器生成对抗网络(ac-GAN)被训练用于生成基于椎体单元(VU)位置(颈椎、胸椎和腰椎)条件的合成磁共振图像(MRI)。在这里,我们提出了一种生成合成数据集的方法,并对其三个关键指标(图像保真度、样本多样性和数据集隐私)的特性进行了深入分析。